| Project start date: 2007-09 | Project end date: 2009-09 |
This two year project built upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) using advanced data mining and knowledge capture technologies to allow archaeologists to discover, share and analyse datasets and legacy publications that had hitherto been very difficult to integrate into digital frameworks. The project had three interrelated objectives, each represented by a distinct workpackage.
| Methods used | Category |
|---|---|
| Accessibility analysis | Strategy and project management |
| Audio-visual interaction (synchronous) | Communication and collaboration |
| Collaborative publishing | Data publishing and dissemination |
| Content analysis | Data analysis |
| Data mining | Data analysis |
| Documentation | Strategy and project management |
| General project management | Strategy and project management |
| General website development | Data publishing and dissemination |
| Human factors analysis | Strategy and project management |
| Indexing | Data analysis |
| Interface design | Data publishing and dissemination |
| Iterative design | Strategy and project management |
| Resource sharing | Communication and collaboration |
| Risk management | Strategy and project management |
| Searching and querying | Data analysis |
| Security planning | Strategy and project management |
| Spatial data analysis | Data analysis |
| Statistical analysis | Data analysis |
| Text mining | Data analysis |
| Usability analysis | Strategy and project management |
| Use of existing digital data | Data capture |
The project consists of three work packages each dealing with a particular type of data.
Workpackage 1 - The underlying dataset comprises over 1,000,000 records (held in Oracle RDBMS) aggregated from the National Monuments Records of Scotland, Wales and England as well as Historic Environment Records from numerous local authorities and the ADS’s own archive holdings. The facets selected will be standard hierarchical ‘What’, ‘Where’, and ‘When’ facets plus a ‘Media’ facet to allow the selection of particular subsets of resources. The facets are populated from existing thesauri (e.g. the Thesaurus of Monument types) in XML format and extended/integrated to allow for geographical differences, such as terminological differences in monument and period types between Scotland and England. The Archaeotools project also integrates thesauri served in XML by Simple Knowledge Organisation Systems (SKOS ) based web services developed by the AHRC-funded Semantic Tools for Archaeology project (STAR ) based at the University of Glamorgan.
Work Package 2 - deals with primariy unpublished archaeological reports (grey literature), in total approximately 1000 reports ranging from 10 to 500 hundred of pages. These reports are published by a wide range of archaeological organisations. As an example, OASIS project actively gathers digital versions of grey literature fieldwork reports and currently holds around 2300. This total grows by around 50-100 reports a month; all reports can be downloaded, free of charge, from the ADS.
Work Package 3 - The system is extended to capture metadata from legacy historical documents, using the PSAS (annual Proceedings of the Society of Antiquaries of Scotland, from 1851 to 1999) as an exemplar corpus and utilising the University of Edinburgh’s geoXwalk service to recast place names and locations extracted from text as national grid references (NGRs), allowing enhanced geospatial searching of the data.
The ultimate goal of this project is to create a faceted search, browse and knowledge management system for archaeologists to access, share and re-use archaeological data. The working system will be online by early 2010, and a demonstration system is available at http://archaeologydataservice.ac.uk/. A registration is required for accessing the demo.
Generation of meta-data (XML, RDBMS) from semi-structured and unstructured texts (such as HTML, PDF and Word documents)
The Archaeotools project, faceted classification and natural language processing in an archaeological context.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038
S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)
Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.
| UK HE institutions involved: |
|---|
| University of Sheffield |
| University of York |
| Principal staff member: | Prof. Julian Richards, Dr Stuart Jeffrey, Prof. Fabio Ciravegna, Stewart Waller, Ziqi Zhang, Sam Chapman, Tony Austin |
|---|---|
| Other staff: | |
| External expertise: |
![]() |
| This project description was developed as part of the ICT Guides project. |
| Metadata on this arts-humanities.net record | |
|---|---|
| Author(s) of record | Ziqi Zhang |
| Title | Archaeotools: Data mining, facetted classification and E-archaeology |
| Record created | 2010-02-01 |
| Record updated | 2010-02-01 14:56 |
| URL of record | http://www.arts-humanities.net/node/3005 |
| Citation of record | Ziqi Zhang: Archaeotools: Data mining, facetted classification and E-archaeology. <http://www.arts-humanities.net/node/3005> created: 2010-02-01, last updated 2010-02-01 14:56 |