Separating Concepts from Labels

Robert Stevens — Tue, 20 Apr 2010 13:37:28 +0000

When creating ontologies it is good practice to separate the concept and the label or term used to refer to that concept. Take the category or class that the object that sits on the top of your neck belongs. The words "Head", "Téte", "Kopf" and "Cabeza" are the terms used in English, French, German and Spanish that all refer to the same category of objects. The category is the same, but the label is different. We can change the label or term without our notion of the category to which it refers changing. Managing synonymy and polysemy is a strong reason for separating the symbol for categories in the ontology from its label.

Both OWL and the OBO Format allow this separation quite easily. The RDFS label in OWL can be used (with language tags as well) to use names that do not necessarily correspond to the URI. I can have just a number for the URI and a "proper term" for the label. The OBO format similarly allows such a distinction (the OBO guidelines insist on such a separation and that the ID is semantic free). James J. Cimino in his Desiderata for Controlled Medical Vocabularies also expouses this separation [1].

Sometimes terminologies or vocabularies are called ontologies. Rather, an ontology, via its labels or terms, can deliver a vocabulary, but the ontology itself is not a vocabulary. The difference is that the concept becomes the first class citizen, not the words used to describe the concept.

Many ontologies formalise this distinction by using semantic free identifiers for the concept. It is this identifier that is used, for example, as the means of annotation. The Gene Ontology has a set of rules for change summarised as: a change to a definition changes the "nature" of the concept; it is now a diferent category and thus requires another identifier–it has become a different entity. If the same identifier is kept, the meaning of the annotation changes. This is why the GO has obselete terms. If, however, just the term or label changes, then the underlying concept has not changed (the label changes from "head" to "kopf" — the concept is still the same and the annotation (of some data item in a database) means the same as it did before the label change. thus annotations are made with the id or URI, not the label (this is why OBO Eds should always be quoted in papers along with the meaningful label or term.).

Many ontologies also have rules or < href"point to a place holder on naming conventions">naming conventions for their terms or labels. This is simply a matter of consistency and explicitness in the labelling such that the meaning is, as much as is possible, apparent from the "presentation" of the concept. The assignment of the semantic free identifier usually becomes part of the ontology authoring process — such ids are usually digits or combinations of letters and digits that are automatically generated. only new ids are given (ids are never issued twice); all numbers have the same numbers of digits, with padding to the left with zeros.

In summary, this separation of meaningless id/uri and meaningful label is a "best practice" that aids in both ontology and data management.

Authors

Robert Stevens
School of Computer Science,
The University of Manchester,
Oxford Road,
Manchester,
UK

Duncan Hull
EMBL Outstation – Hinxton,
European Bioinformatics Institute,
Wellcome Trust Genome Campus,
Hinxton,
Cambridge,
CB10 1SD,
UK

References

[1] Cimino JJ. Desiderata for controlled medical vocabularies in the Twenty-First Century. Methods Inf Med 1998;37(4-5):394-403.

Review of What is an ontology?

David Osumi-Sutherland — Wed, 24 Feb 2010 10:16:55 +0000

This is a review of What is an ontology? by Robert Stevens, Alan Rector and Duncan Hull

This article could be split quite neatly in two articles. One is an excellent article that begins about a third of the way through the full piece. It covers the technical aspects of ontology building: subsumption hierarchies; necessary vs necessary and sufficient conditions for class membership; disjointness; relations; upper ontologies and their usefulness in restricting the choice of appropriate relations. It draws heavily on upper ontologies developed by philosophers (at least some of them realists) and shows why they are useful. It concludes with a clear and strong case for why good ontologies are needed in the biosciences. I have no argument with this article.

The other is, to me at least, a rather confusing attempt to argue that ontologies consist of concepts, as opposed to statements about reality. I find these arguments difficult to square with some of the statements made in the rest of the article. However, I’m also not convinced that there is much difference between the author’s position and a realist stance. There argument hinges on the subtle issue of the reality of classes and they don’t make other arguments commonly made against a realist stance – for example, the ‘argument from intellectual modesty’ (Smith et al., 2006), or the belief that ontology terms should simply follow the use of terms in language. In fact, they clearly argue for ontology as a means to overcome the latter:

“Ontology should be distinguished from thesauri…”

“Human beings can give multiple labels to … categories. This habit of giving multiple labels to the same category and the same label to different categories (polysemy) leads to grave problems…”

Their argument begins with what strikes me as a cheap rhetorical trick designed to close down debate:

“The definition here will not suit a lot of people and upset many (especially use of the word “concept”); We make no apology for this situation, only noting that the argument can take up resources better used in helping biologists describe and use their data more effectively.”

If the authors think this discussion is a waste of resources, then why bother spending a few paragraphs making their case? I suspect that they do actually care about the argument because they worry about the implications of taking a realist stance. If so, it would have been interesting to hear some of those concerns (on the realist status of maths for example) made more explicit.

There is also a notable lack of reference to any sources of opposing argument. For those who wish to pursue this argument further, waste of time though it might be, some references to counter arguments would be good. Either of these references (or both) would do nicely:

Smith, et al., 2006. Towards a reference terminology for ontology research and development in the biomedical domain. Proceedings of KR-MED 2006

Smith, 2004. Beyond Concepts: Ontology as Reality Representation. Proceedings of FOIS 2004

Of the arguments against a realist stance, the weakest uses a straw man:

“… with a computer science ontology … there is less concern with a true account of reality as it is information that is being processed, not reality.”

Who could argue? Surely the question is whether the information being processed is making assertions about reality or not? The authors case would be stronger if this line were deleted.

The heart of their argument is stated here:

“As human beings, we put these objects into categories or classes. These categories are a description of that which is described in a body of data. The categories themselves are a human conception. We live in a world of objects, but the categories into which humans put them are merely a way of describing the world; they do not themselves exist.”

A perhaps pedantic point: are classes “described in a body of data”? I would have thought it more likely they are assertions about reality that are a reasonable scientific interpretation of a body of data. This confusion of data and its interpretation as assertions occurs consistently throughout the article.

More importantly, what might it mean to state that a class is real? Even the authors seem to agree that there is regularity in the universe, whether we observe it or not. For example, later in the article, they state that:

“Each instance of a ‘Helium’ object was not discovered in 1903; most helium atoms existed prior to that date, but humans discovered and labelled that category at that date.”

In 1903, humans discovered something that already existed: a class of atoms that share specific properties. Surely this means that a definition of the class ‘helium atom’ is making assertions about reality. Is this not different from some arbitrary class defined as including say: all helium atoms, horses, unicorns and two bedroom flats in North London?

This is not to say that there is only one true way to categorise any one object, or that there is a clean dividing line between classes we might be happy to define as Universals (Smith ), such as Helium atoms, and more contingent classes.

In the abstract, the authors argue that the debate over a realist vs a conceptual stance is a distraction that “… can take up resources better used in helping biologists describe and use their data more effectively.” Why might a realist stance be useful in helping scientists?

I believe that a realist stance is useful for ontologies made up of scientific assertions (for example, about chemistry, anatomy, or physiology), because it gives us a way to judge the quality of an ontology. If the ontology makes assertions that run counter to what we have good reason to believe is true, then it is misleading as a knowledge-base about science and its use in inference and grouping of annotations will produce results that we have good reason to believe are incorrect. Surely such an ontology would be bad – even when judged purely in practical terms.

Having said all this, I’m happy for this article to pass review for the Ontogenesis Knowledge Blog as long as the authors add references to opposing arguments. The authors may wish to consider taking into account my points with regard to the abstract and the apparent use of a straw-man argument. The article already provides an excellent introduction to the basic technical aspects of ontology building. With the addition of references to opposing arguments, the article and this review should provide a good starting point for those interested in exploring the realism vs conceptual(ism?) debate further.

Minor corrections:

foundary -> foundry

polysemy – missing initial bracket

License

This paper is an open access work distributed under the terms of the Creative Commons Attribution License 3.0, which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are attributed.

concept – Ontogenesis