Ontologies

Ontologies and what they are not

If one would had entered ontology in to Google in February 2006, the first most relevant hit would had been Thomas Grubers by now classical explanation on ontologies. His short answer is that an ontology is a specification of a conceptualisation (Gruber 1995). The second hit right before the wikipedia entry is the gene ontology project. The gene ontology project provides a controlled vocabulary for the research community working on the gene decodification to exchange and transfer their knowledge. Thomas Gruber has not updated his web page at Stanford since the Nineties. He has gone to private industry and works now with RealTravel.com, a portal to share knowledge about travelling. People looking for information about where to go, where to stay, or what to do in their travel can learn from the experiences of those who have been there. Users share their knowledge.

In the Humanities research community other definitions of ontology might be better known. Ontology is here seen as a discipline of philosophy the 17th century and the German philosopher Wolff. Early definitions of ontology date even further back to Aristotle. Philosophically, ontology is the science of what is or of what a being qua being is. Philosophical ontology has received some more recent attention in the work of the 20th century philosopher Martin Heidegger and his many successors in Germany and France.

Ontologies as used in computing and information sciences do not present a theory of being. For Thomas Gruber, a conceptualisation abstracts on its particular 'world', and an ontology is a specification of this representation. Such a conceptualisation does not depend on the used vocabulary or its further context. It formally describes the view of the world that is shared in a domain, but does not fully determine it. In this sense the statement that grass is green is not existential beyond the domain of interest. An ontology stating that grass is green would have excluded purple coloured grass from its conceptualisation of the world, but would not have sufficiently defined what grass really is. What modern ontologists in computing industries really do is to commit a domain to an ontology and publish this ontological commitment.

The general purpose of such applied ontologies is to provide a clearer view of the data by structuring it and creating semantic tags to define equivalent entries in different data stores. The same objects can have different names in two different databases. E-Science is a term used in the UK's programme, while in the US the term cyberinfrastructure is preferred. For computers to understand that roughly the same things are discussed, an ontology showing the equivalence relation is useful. By committing oneself to an ontology, a common language is subscribed to. No universal, complete theory is developed, but a contract to exchange data and knowledge in a machine readable format is committed to.

Scope and size of ontologies

Nicola Guarino from the Laboratory of Applied Ontologies in Italy summarises the behaviour of ontologies by stating that '[...] in AI, ontology refers to an engineering artefact, constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words' (Guarino 6-8 June 1998). Individuals are such assumption describing basic objects. They can be grouped into classes of object types. Properties demonstrate characteristics of objects and relations. Ontologies are therefore publications of meanings for a domain in order to allow collaboration. The first step in ontology development is always to create a controlled vocabulary to define the terms to be used by a community of users. These definitions must be unambigious. In our example above the ambiguity between cyberinfrastructure or e-Science would have to be resolved.

Ontologies are more common than is generally known. Glossaries e.g. are fairly simple ontologies, as they do not present much more than a definition of terms. Thesauri are more semantic and express relationships between homonym terms. At the end of the scale is a fully formalised ontology to define all relevant entities of a domain and their relationships. A taxonomy is another example of an ontology and organises controlled vocabulary terms into a hierarchy, e.g. the terms cyberinfrastructure and e-Science could both be part of the category science. Taxonomies however only have parent-child relationships and are therefore not useful for expressing more complex relations. Though it is also built on top of a controlled vocabulary a formal ontology is more, as it freely specifies the relations between the terms. These define how the terms can be put together.

As ontologies are always attempts to define basic terms and their relationships, they can easily reach a significant size that possibly cannot be administered any more. To work against this danger, two strategies define the scope and the size of an ontology. One huge top down ontology capturing everything stands alongside the bottom up idea of one ontology per domain. The advantage of a huge ontology is that inconsistencies can be avoided, but it can be difficult to maintain. One famous example for such a universal approach is the Cycorp project formalising common sense. In Cyc over 1000000 hand-entered rules present a large scale of what is normally considered to be common human sense like 'The earth orbits the sun'.

Several small ontologies do not need a centralised control, but require efforts to clear inconsistencies. One approach to bring together smaller ontologies is an upper or foundational ontology. A domain ontology emphasises a specific view of the world. An upper ontology defines common objects that can be used by several domain ontologies. In a core glossary these common terms are defined to develop a common understanding. Several projects exist that attempt to describe such a common foundation ontology. Better known are Dublin Core and WordNet . The Dublin Core MetaData initiative develops an interoperable metadata standard to describe a wide range of resources. WordNet is a semantic lexicon for several languages. Its work began in 1985 under the direction of George A. Miller from Princeton University.

The Semantic Web, RDF and OWL

The best known application for ontologies is the Semantic Web (Berners-Lee, Hendler et al. 2001). The Semantic Web is according to Tim Berners-Lee the existing web 'given a well defined meaning, enabling computers and people to work in cooperation.' (Berners-Lee, Hendler et al. 2001). The machine-understandable content of the Semantic Web is defined in ontologies. The current standard is called OWL (Web Ontology Language) based on the Resource Description Framework (RDF) language. Both OWL and RDF belong to W3 XML standards.

RDF provides data in a machine readable format without the need for heavy-weight database management systems. RDF is based on the idea that all you need to record information are three items in a triple: subject, predicate and object. RDF publishes this information in such triples and formalises the relation between the basic information items. The subject declares the who or what of a sentence, while the predicate defines the subject. The triple statement has always got the form:

E.g.: <http://www.ahessc.ac.uk/wiki/bin/view/Blog/WebHome> title <AHeSSC Blog>. This triple formalise the resource that the AHeSSC Blog homepage (subject) has the title (predicate) AHeSSC Blog (object).

In RDF this statement would look like with dc denoting the use of the mentioned Dublin Core standard:

Luckily, there are some really good so-called ontology editors available that are easy to use and create such a syntactically complicated statement automatically. One of them, Protege will be later discussed in more detail.

RDF Schema defines the vocabulary for RDF. It is a declaration about which RDF elements are classes and which are properties.

This RDFS defines that the classes of dogs and men belong to the class of mammals, which is itself part of the class of animals. OWL is an extension of RDF Schema. OWL and RDF Schema are compatible and it is difficult to clearly discriminate where RDF Schema ends and OWL starts. Ontologies are used to integrate data and OWL is doing exactly this for the web. It is a standard published by the W3 consortium to define data for a specific knowledge domain. The example above in RDF Schema looks in OWL like this:

Compared to RDF Schema, the namespace owl points at the OWL definition of the W3 consortium. A namespace is an XML standard to ensure that identifiers are unique and to resolve ambiguity between two identically named entities.
XML, RDF, RDFS, ontologies etc. work together to create a semantic web, where not only documents are displayed and linked, but meaning is given to resources.

 The layer cakeFigure 1: The layer cake

Figure 1 shows the famous layer cake used by Tim Berners-Lee to define the position of ontologies in the Semantic Web technology. As can be seen, ontologies mediate the logic of a domain as expressed in rules and trust with the resource description definitions. The two base layers are the RDF technologies, which assign and define properties of objects. The ontology layer goes further by formalizing relationships between these properties. On top of the ontology layer, the logic and proof frameworks check the ontological commitments for consistency and possible inferences. The trust layer finally deals with assurance policies and security.

Ontology tools

Ontolingua is a set of tools and services to support achieving consensus on a shared ontology in potentially geographically distributed groups. These tools make use of the world-wide web to enable wide access and provide users collaboratively with the ability to publish, browse, create, and edit ontologies stored on an ontology server. Users can assemble a new ontology from a library of modules.

Jena is a JAVA API to process XML and RDF documents.

Protege is a Java based ontology editor that allows to create ontologies and afterwards can be exported to an OWL format. As it is very widely used, a short description of Protege could be a useful entry point to the world of ontology modelling. Figure 2 shows Protege when it is first opened with its pizza owl example loaded.

 The Protege interfaceFigure 2: The Protege interface

Classes and subclasses can be defined by clicking on the OWLClasses tab and deriving them from the basic owl:Thing class. Properties and individuals can be created by using the Properties and Individuals menus. An interesting pizza type in owl:

CIDOC CRM

The CIDOC Conceptual Reference Model (CRM) is an upper ontology to facilitate data integration between cultural heritage sources.

The CIDOC reference model aims at creating one consistent global resource to include information from archives and libraries. CIDOC CRM was developed in a several year long process by an interdisciplinary working group of the International Committee for Documentation of the International Council of Museums (CIDOC/ICOM) under the scientific lead of ICS-FORTH (Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH)). It has become an ISO standard.

 CIDOC CRM (Doerr 2003)Figure 3: CIDOC CRM (Doerr 2003)

The model consists out of 80 classes and 130 relationships as a basis for the exchange of information. CIDOC CRMs main purpose is to integrate data, as figure 2 shows. It works on top of the integrated factual knowledge from heterogeneous information sources.

CIDOC CRM has already been implemented in some research projects:

  • SCULPTEUR to make available cultural heritage online
  • SIMILE: AN MIT based project

Bibliography

  • Berners-Lee, T., J. Hendler, et al. (2001). "The Semantic Web." Scientific American: 28-37.
  • Doerr, M. (2003). "The CIDOC CRM - An Ontological Approach to Semantic Interoperability of Metadata." AI Magazine 4(1).
  • Gruber, T. R. (1995). "Towards principles for the design of ontologies used for knowledge sharing." International Journal for Human Computer Studies 43(5/6): 907-928.
  • Guarino, N. (6-8 June 1998). Formal Ontologies in Information Systems. FOIS-98, Trento Italy.

This briefing paper was written for AHeSSC, the Arts and Humanities e-Science Support Centre. It is published here with permission from AHeSSC.

AttachmentSize
Ontologies_BP.pdf163.96 KB
Syndicate content