on April 20, 2010 by in Under Review, Comments (0)

Separating Concepts from Labels

When creating ontologies it is good practice to separate the concept and the label or term used to refer to that concept. Take the category or class that the object that sits on the top of your neck belongs. The words "Head", "Téte", "Kopf" and "Cabeza" are the terms used in English, French, German and Spanish that all refer to the same category of objects. The category is the same, but the label is different. We can change the label or term without our notion of the category to which it refers changing. Managing synonymy and polysemy is a strong reason for separating the symbol for categories in the ontology from its label.

Both OWL and the OBO Format allow this separation quite easily. The RDFS label in OWL can be used (with language tags as well) to use names that do not necessarily correspond to the URI. I can have just a number for the URI and a "proper term" for the label. The OBO format similarly allows such a distinction (the OBO guidelines insist on such a separation and that the ID is semantic free). James J. Cimino in his Desiderata for Controlled Medical Vocabularies also expouses this separation [1].

Sometimes terminologies or vocabularies are called ontologies. Rather, an ontology, via its labels or terms, can deliver a vocabulary, but the ontology itself is not a vocabulary. The difference is that the concept becomes the first class citizen, not the words used to describe the concept.

Many ontologies formalise this distinction by using semantic free identifiers for the concept. It is this identifier that is used, for example, as the means of annotation. The Gene Ontology has a set of rules for change summarised as: a change to a definition changes the "nature" of the concept; it is now a diferent category and thus requires another identifier–it has become a different entity. If the same identifier is kept, the meaning of the annotation changes. This is why the GO has obselete terms. If, however, just the term or label changes, then the underlying concept has not changed (the label changes from "head" to "kopf" — the concept is still the same and the annotation (of some data item in a database) means the same as it did before the label change. thus annotations are made with the id or URI, not the label (this is why OBO Eds should always be quoted in papers along with the meaningful label or term.).

Many ontologies also have rules or < href"point to a place holder on naming conventions">naming conventions for their terms or labels. This is simply a matter of consistency and explicitness in the labelling such that the meaning is, as much as is possible, apparent from the "presentation" of the concept. The assignment of the semantic free identifier usually becomes part of the ontology authoring process — such ids are usually digits or combinations of letters and digits that are automatically generated. only new ids are given (ids are never issued twice); all numbers have the same numbers of digits, with padding to the left with zeros.

In summary, this separation of meaningless id/uri and meaningful label is a "best practice" that aids in both ontology and data management.

Authors

Robert Stevens
School of Computer Science,
The University of Manchester,
Oxford Road,
Manchester,
UK

Duncan Hull
EMBL Outstation – Hinxton,
European Bioinformatics Institute,
Wellcome Trust Genome Campus,
Hinxton,
Cambridge,
CB10 1SD,
UK

References

[1] Cimino JJ. Desiderata for controlled medical vocabularies in the Twenty-First Century. Methods Inf Med 1998;37(4-5):394-403.

Tags: , , , , , , ,

No Comments

Leave a comment

Login