Text mining

project: William Godwin's Diary

The project provides a digital edition of the diary of William Godwin (1756-1836). Godwin’s diary consists of 32 octavo notebooks. The first entry is for 6 April 1788 and the final entry is for 26 March 1836, shortly before he died. The diary is a resource of immense importance to researchers of history, politics, literature, and women’s studies. [read more]

project: Geographies of Orthodoxy: mapping the English-Pseudo-Bonaventuran Lives of Christ, c. 1350-1550

Geographies of Orthodoxy offers a new account of an English devotional phenomenon and affective literary tradition usually characterised as ‘pseudo-Bonaventuran’ by modern commentators. Geographies of Orthodoxy proposes to examine and make openly accessible through the latest electronic means the entire material remains of the anglophone pseudo-Bonaventuran tradition. [read more]

tool: Solr

Purpose: 

Solr is an open source enterprise search platform from the Apache Lucene project. It operates as a standalone full-text search server within an appropriate servlet container, such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.

Features: 

• May be tailored to many types of application with minimal programming knowledge
• Extensive plug-in support
• Full-text indexing and search

A&H use case 1 description: 
The “British Cartoon Archive Digitisation (BCAD)” project has used Solr to deliver the search results and metadata.
Creator: 
CNET Networks
Publisher: 
Apache Software Foundation
Software/programming languages used: 
Suite: 
Data structuring and enhancement: 
Alternate tool(s): 

Sphynx

Licence: 
lifecycleStage: 
Platform: 

tool: Lucene

Purpose: 

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Features: 

• Scalable, high-performance indexing
• Powerful, accurate and efficient search algorithms
• Cross-platform solution

A&H use case 1 description: 
The “Freeze Frame – Historic Polar Images 1845-1960” project has used Lucene for advanced search of photographs from both Arctic and Antarctic expeditions.
Creator: 
Doug Cutting
Publisher: 
Apache Software Foundation
Software/programming languages used: 
Suite: 
Data structuring and enhancement: 
Alternate tool(s): 

InQuira, Verity, dtSearch, ISYS

Licence: 
lifecycleStage: 
Platform: 

tool: Exceed

Purpose: 

Exceed is a PC X server system which allows for graphical user interface (GUI) interactions with networked computers. Exceed provides data exchange among applications on different platforms including UNIX, Linux, VMS, X Window Based System and IBM mainframes.

Features: 

• Allows users to connect Microsoft Windows desktops to a wide variety of X Window-enabled servers and access X applications.

A&H use case 1 description: 
Exceed was used the North Sea Palaeolandscapes project to link up different datasets that were available on different operating platforms.
Publisher: 
Open Text
Creator: 
Hummingbird Ltd.
Data publishing and dissemination: 
Specifications: 
Alternate tool(s): 

MicroXwin, X Window System

Strategy and project management: 
Software/programming languages used: 
Licence: 

tool: GeoParser

Purpose: 

GeoParser is a text analysis tool that may be used to identify, tag and (where appropriate) disambiguate references to geographic location in a text resource. The tool uses Natural Language Processing to analyse the composition of a resource and identifying words that match its geographic database. The approach is useful for processing ambiguous references, such as names that may have one of several locations (e.g. Belfast in Ireland, New Zealand and Canada) and distinguishing names that may be confused with other action (e.g. Reading in Berkshire and reading as an activity).

Features: 

• Analyse text stored in structured, semi-structured and unstructured text formats
• Creates information that may be processed and used by a global gazetteer

A&H use case 1 description: 
The Embedded GeoCrossWalk project use the GeoParser to extracts the place names found in proceedings of the Stormont Assembly.
Publisher: 
University of Edinburgh School of Informatics - Language Technology Group
Creator: 
University of Edinburgh School of Informatics - Language Technology Group
Data capture: 
Communication and collaboration: 
Software/programming languages used: 
Strategy and project management: 
Alternate tool(s): 

Metacarta’s GeoTagger, Digital Reasoning’s GeoLocator, Lockheed Martin’s AeroText, and SRA’s NetOwl

Practice-led research: 
lifecycleStage: 

project: Embedding GeoCrossWalk

The Embedding GeoCrossWalk project sought to provide a deeper understanding of how references to place in structured texts can be researched and automatically extracted. The project aims were threefold. Firstly it sought to deploy the Geoparser tool, developed previously by the Language Technology Group of Edinburgh University's School of Informatics, to georeference the Stormont Papers, using Natural language Processing (NLP). [read more]

project: Montréal l'avenir du passé (MAP)

Montréal l'avenir du passé (MAP) was established in 2000 to create an historical GIS research infrastructure for 19th and 20th century Montréal. We have digitized six highly detailed historical maps representing all buildings in the city for 1825, 1846, 1880, 1912, 1949 and 2000. The first three and last have been geo-referenced and we have successfully "peopled" them by linking at the street-scape (1846) or lot level (1880 & 2000) census returns, tax records, city directories and a wide variety of non-routinely generated sources. [read more]

project: HESTIA

HESTIA provides a new approach towards conceptions of space in the ancient world, supported by a grant from the Arts and Humanities Research Council (AHRC). Combining a variety of different methods, it examines the ways in which space is represented in Herodotus' History, in terms of places mentioned and geographic features described. [read more]

Pages