Why use an ontology?
The Ontogenesis KBlog is all about using ontologies within biology, medicine and health from “what is an ontology?” to the finer detail of various representation problems and aplication use. However, we should also address the basic issue of “why use an ontology?” as the means of addressing the representational needs of bio-, medical and health data. In this Kblog I present a simple motivation and requirements that have led us to use ontologies and then enter into a short discussion of whether ontology is the one thing that can anser these requirements.
Bio-health Informatics Group
School of Computer Science
University of Manchester
There was discussion in the Ontology-summit and at semanticweb.com about elevator pitches about why to use ontologies and this Kblog is my version of that, but ends up being a bit more than an elevator pitch. The “pitch” at semanticweb.com falls into the trap of saying that “[an ontology] helps the computer understand things like a human”; I think this is too fanciful and we can be much more prosaic about the needs for using ontologies. Over the years I’ve been refining my “pitch” for why some mechanism for how we talk about things is needed and then why ontology is an answer to this need.
An elevator pitch is supposed to grab the attention and get the point across succinctly: “You need to know what you’re talking about!” – is my pitch for ontologies. They’re supposed to last between 30 seconds and 3 minutes (a tall building one supposes), so my short version above is really quite short, but I can expand a little.
So, here goes:
We have a data item:
Even if we assume base ten, this number is fairly useless; we (either human or computer) have no clue what this number is about.
If we add more information we obviously know more:
given that we know that this is an SI unit of length (which we may want extra information to actually know – or our computer will need it), we can know that 27 mm is a measurement of length; we don’t know, however, of what entity it is a measurement.
So, this is the obvious next step:
Tail of 27 millimetres
and so on; we now know we have a tail of 27 mm, but a tail of what? So we add “mouse” into the mix and say that the “tail” is part of that “mouse”. Obviously, we can carry on for a time, looking at mouse strain, growing conditions, etc. etc., but I’m sure you get the point. We have to describe our entities, and agree on the deescriptions of the entities, for us (and our computers) to make use of our expensively data.
We now know enough about this number 27 to make some use of it. The need is, therefore, to create such descriptions of the entities about which we have data and the relationships between those entities. Those descriptions need to be such that the entities they describe can be recognised to belong to those categories and those entities and their descriptions need to be held in common by those using the data. All through this pitch, we’ve implicitly been talking about understanding by humans; we need the computer to be able to manipluate and reason with these symbols too (though I shy away from the computer “understanding”).
That’s basically it; we need to know what it is we’re talking about and defining entities in a field of interest is what an ontology does. That’s where the elevator pitch ends. All the rest is talking about whether an ontology does the job as specified.
Of course, scientists usually record such metadata (and more) about their data. The trouble comes with interpreting and sharing that metadata. Most explanations for the need for ontology lead with the notions of heterogeneity and polysemmy that are rife in most areas of human endeavour, but this is really just another layer of need. It’s still really the need to know about what it is we’re talking – different words for the same thing and the same words for different things just muddies the water more. The base problem is recognising to which categories objects the entities about which we have data.
So, we need to know what things are and to agree on what those things are; does this mean we have to use an ontology? Ontologies match the requirements set forth above. Ontologies are appealing, if only because the discipline has its origins in the description of that which “exists” and that is, after all, what has been laid out in the pitch above. A key point is the “definition” of the entities – that by which we recognise objects of a particular class of entities. This has obvious computational benefits, but is really important so that we move away from relying on the labels of things to “know what they are”; this has got us into the very mess from which we need to extracate ourselves. Closely linked to this is the sharing of the definitions and the understanding it embodies. Only by the widest possible number of people adopting the same descriptions of entities do we actually achieve our goal of describing data in such a way that it can be used and re-used. This is not the place to debate what representational form is needed for an ontology, but how soever it is represented, shared interpretation (understanding) must be enabled. Elsewhere in this k-blog we have started to describe the http://ontogenesis.knowledgeblog.org/1074]types of statement in an ontology]; from this we can say that “ontologies are not the only fruit. That is, to capture sufficient knowledge about a domain for computational and information sharing needs, we probably need more than just ontologies. An ontology, by describing the entities in a field of interest, provide the essential framework for hanging rules, probabalistic information, “contingent knowledge” etc etc.
We need to know about what it is that we’re providing data. We need to know, amongst other things, about the entities to which data apply (and the relationships between those entities), the investigations that produced those investigations, the provenance of the information and the information itself. Ontologies are about describing definitional information about entities in a field of interest and thus are an option for the means by which we provide information about what our data are about. To re-iterate the short elevator pitch, “we need to know what we’re talking about” and that’s why we use ontologies.