Metadata in Ontologies
This kblog is about metadata for an ontology and the statements in an ontology. Metadata are another example of a kind of statement in an ontology. we divide metadata statements into three kinds: editorial statements; explanatory statements and structural statements.
Robert Stevens and Alan Rector
bioHealth Informatics Group
School of Computer Science
University of Manchester
When we build an ontology we describe or model a field or domain of interest. an ontology can contain many types of statement. As well as saying things about that domain in the ontology, we also want to say things about the ontology and the entities within that ontology (classes, individuals, properties and larger patterns within that ontology). Collectively these are statements about statements or meta-statements and these meta-statements form an ontology’s metadata.
So, a class describes the instances in a domain: All instances of this class are also instances of this other class; all instances of this class hold a particular relationship with at least one instance of another specified class; and so on. We also have the individuals and properties that comprise the ontology or knowledgebase. Together these form the basic ontology that is the description of the domain or field of interest. In some cases, however, we wish to make statements about these statements, such as notions of “generalisation” and higher order knowledge about the classes, as well as documentation about the process of ontology building and so on. These latter kinds of statements about statements are the topic of this k-blog.
We need to say things about the classes, properties and individuals (or the individual axioms) themselves – who authored it; when it was authored; statements about the class rather than the instances of the class; and so on. If the axioms are statements in the ontology, then these other statements are meta-statements or metadata for that ontology or its entities.
Metadata (aka annotations) – is knowledge about the knowledge artefact itself as opposed to the domain knowledge it represents. If the class Electron was authored by “Robert Stevens” on a particular date, then this is an editorial statement about the class, not the individuals of the class. Dublin Core provides a set of annotations for these kinds of metadata that are available within tools such as Protege, though they are not part of OWL itself.
These metadata used in ontologies are of three kinds:
- Editorial / provenance meta statements: Information about the process of acquiring the knowledge and its sources, e.g. The author, date of entry, revision history, authority, etc. The standard set of editorial / provenance knowledge is the Dublin Core subset used widely in the library community. Some editorial statements are transitory and aimed purely at the development process; others are intended to be permanent.
- Explanatory statements: Text definitions, guidelines, comments, and other explanatory material. These often include a natural language definition for the class, as well as comments, provenance and evidence for the conceptualisation of the class.
- Structural information about the artefact: Some information artefacts contain a meta-model or other information that describes their own structure. For instance regular patterns of axioms may be used and templates for these patterns can be part of the metamodel. It is a description of how the ontology is put together, rather than the model itself.
It is important to remember these metadata when building an ontology. The The Gene Ontology has a good metadata system covering the first two kinds in our list. All GO, and other OBO ontologies, must have a textual definition describing the criteria by which instances of that class can be recognised. In addition, authors, dates, evidence and so on are also recorded. All these metadata allow users to assess the ontology and to work out “why it is the way it is”. They are an important aspect of quality assurance and of provenance. In many cases, they are akin to software documentation.
In OWL we use various kinds of annotation properties, one of the basic components of OWL. There are built-in annotation properties that help at the editorial level: The RDFS:label allows one to place a human readable name over and above the URI; there is an author property; a comment property; and so on. Sets of annotations representing Dublin Core are automatically imported into ontologies developed in Protege. Tools such as OBO-Edit make adding such metadata a key part of the process of authoring an ontology. They are key as they help users interact with the ontology, understand the ontology and to be sure of the quality of that ontology. Like all metadata, such ontology metadata can be tedious to create, but it is an important part of the authoring process and will help to determine if the ontology is to be used.
OWL has a mechanism called punning, where one can create both a class and an individual with the same URI. So, for instance, one can create a class called CarbonAtom and an individual called CarbonAtom. The class represents all the individual carbon atoms. the individual CarbonAtom can be thought of as representing the class itself. thus one can say things about the class that one wouldn’t want to say about the individuals of the class. this can be a place to store all sort of statements about the class, including the three kinds of metadata outlined here. In particular, it is a place where one can make higher order statements about the class; in this case that it was known from ancient times (well, not actually the atom, but the substance I suppose); its role in industry etc. Punning is a new feature of OWL and doesn’t seem, as yet, to be widely used.