Archaeotools: Data mining, facetted classification and E-archaeology
Submitted by
Ziqi Zhang on
Mon 01/02/2010 - 14:39
This two year project built upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) using advanced data mining and knowledge capture technologies to allow archaeologists to discover, share and analyse datasets and legacy publications that had hitherto been very difficult to integrate into digital frameworks. The project had three interrelated objectives, each represented by a distinct workpackage.
| Project start date: 2007-09 | Project end date: 2009-09 |
Subject domains:
Era(s):
Country/region(s):
| Methods used | Category |
|---|---|
| Accessibility analysis | Strategy and project management |
| Audio-visual interaction (synchronous) | Communication and collaboration |
| Collaborative publishing | Data publishing and dissemination |
| Content analysis | Data analysis |
| Data mining | Data analysis |
| Documentation | Strategy and project management |
| General project management | Strategy and project management |
| General website development | Data publishing and dissemination |
| Human factors analysis | Strategy and project management |
| Indexing | Data analysis |
| Interface design | Data publishing and dissemination |
| Iterative design | Strategy and project management |
| Resource sharing | Communication and collaboration |
| Risk management | Strategy and project management |
| Searching and querying | Data analysis |
| Security planning | Strategy and project management |
| Spatial data analysis | Data analysis |
| Statistical analysis | Data analysis |
| Text mining | Data analysis |
| Usability analysis | Strategy and project management |
| Use of existing digital data | Data capture |
Funding sources:
Arts and Humanities Research Council (AHRC), Engineering and Physical Sciences Research Council (EPSRC), Joint Information Systems Committee (JISC)
Content types created:
Software tools used:
- Aleph
- Java
- Java Server Faces
- Runes
- Solr
- T-rex
Source material used:
The project consists of three work packages each dealing with a particular type of data.
Workpackage 1 - The underlying dataset comprises over 1,000,000 records (held in Oracle RDBMS) aggregated from the National Monuments Records of Scotland, Wales and England as well as Historic Environment Records from numerous local authorities and the ADS’s own archive holdings. The facets selected will be standard hierarchical ‘What’, ‘Where’, and ‘When’ facets plus a ‘Media’ facet to allow the selection of particular subsets of resources. The facets are populated from existing thesauri (e.g. the Thesaurus of Monument types) in XML format and extended/integrated to allow for geographical differences, such as terminological differences in monument and period types between Scotland and England. The Archaeotools project also integrates thesauri served in XML by Simple Knowledge Organisation Systems (SKOS ) based web services developed by the AHRC-funded Semantic Tools for Archaeology project (STAR ) based at the University of Glamorgan.
Work Package 2 - deals with primariy unpublished archaeological reports (grey literature), in total approximately 1000 reports ranging from 10 to 500 hundred of pages. These reports are published by a wide range of archaeological organisations. As an example, OASIS project actively gathers digital versions of grey literature fieldwork reports and currently holds around 2300. This total grows by around 50-100 reports a month; all reports can be downloaded, free of charge, from the ADS.
Work Package 3 - The system is extended to capture metadata from legacy historical documents, using the PSAS (annual Proceedings of the Society of Antiquaries of Scotland, from 1851 to 1999) as an exemplar corpus and utilising the University of Edinburgh’s geoXwalk service to recast place names and locations extracted from text as national grid references (NGRs), allowing enhanced geospatial searching of the data.
Digital resource created:
The ultimate goal of this project is to create a faceted search, browse and knowledge management system for archaeologists to access, share and re-use archaeological data. The working system will be online by early 2010, and a demonstration system is available at http://archaeologydataservice.ac.uk/. A registration is required for accessing the demo.
Access to digital resource:
Open Access
Data Formats created:
Publications:
The Archaeotools project, faceted classification and natural language processing in an archaeological context.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038
S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)
Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038
S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)
Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.
Institutions affiliated with this project:
| UK HE institutions involved: |
|---|
| University of Sheffield |
| University of York |
Project staff and expertise:
| Principal staff member: | Prof. Julian Richards, Dr Stuart Jeffrey, Prof. Fabio Ciravegna, Stewart Waller, Ziqi Zhang, Sam Chapman, Tony Austin |
|---|---|
| Other staff: | |
| External expertise: |
Tags: Accessibility analysis archaeology Collaborative publishing Content analysis Data mining Documentation Human factors analysis Indexing Interface design Iterative design Resource sharing Risk management Searching and querying Security planning Spatial Spatial data analysis Statistical analysis text text mining Usability analysis
| Metadata on this arts-humanities.net record | |
|---|---|
| Author(s) of record | Ziqi Zhang |
| Title | Archaeotools: Data mining, facetted classification and E-archaeology |
| Record created | 2010-02-01 |
| Record updated | 2010-06-11 11:17 |
| URL of record | http://www.arts-humanities.net/node/3005 |
| Citation of record | Ziqi Zhang: Archaeotools: Data mining, facetted classification and E-archaeology. <http://www.arts-humanities.net/node/3005> created: 2010-02-01, last updated 2010-06-11 11:17 |