on April 5, 2013 by Robert Stevens in Under Review, Comments (1)
Post-coordination: Making things up as you go along
Summary
Pre- and post-coordination are notions that came out of SNOMED and concerns how the vast number of terms needed for coding, in SNOMED’s case, medical records; can one enumerate all needed terms and put them in the right place in a structured vocabulary before use, or does one put in a mechanism for creating terms as and when they are needed and putting them in the right place in the vocabulary’s structure? A pre-coordinated ontology has all the terms and relationships between them needed by an application; it is static; ‘what you see is what you get’. A post-coordinated ontology has the building blocks for the terms needed in an application such that they can be built as required, and so that their relationship can be determined as required; it is dynamic; the ontology is much more than ‘what you see’, since you can compose new expressions from the given building blocks. An ontology built using the Web Ontology Language (OWL) can take a compositional approach; it is built/can be used by composing classes and properties to make other classes. For example, an expression for large hydrophilic amino acid could be composed from the classes Hydrophilic amino acid
and Large amino acid
; in turn, these can themselves be composed of Amino acid
and various qualities of amino acids. In this approach a reasoner can be used to determine the relationship between on-the-fly built expressions and the classes from the base ontology, i.e., expressions can be classified and placed at the correct location in the class hierarchy. This KBlog describes post-cordination, distinguishes it from pre-coordination, and discusses when post-cordination can be used, either at build time or delivery time within an application.
Authors
Robert Stevens and Uli Sattler
Bio-health Informatics and Information Management Groups
School of Computer Science
University of Manchester
Oxford Road
Manchester
United Kingdom
M13 9PL
robert.stevens@Manchester.ac.uk
and Ulrike.Sattler@Manchester.ac.uk
Introduction
The notion of pre- and post-coordination (10.1016/j.jbi.2011.10.002) is used in terminologies such as <a href="http://en.wikipedia.org/wiki/SNOMED_CT#Pre-and_postcoordination”>SNOMED CT and is used to manage the vast number of terms required for coding medical records, but without having to enumerate them all. Though arising from SNOMED, the idea of pre- and post-coordination is widely applicable wherever ontologies are used to describe things and it is likely that not all the desired terms can be made before use. To illustrate, assume we have a (possibly very long) list of expressions e1
, e2
, e3
,… that we want to use in a given application, e.g., to label documents. Large hydrophilic amino acid is an example of such an expression. Also, assume that we have an ontology, vocabulary, or similar, called O
, for these terms. Now _pre-coordination and post-coordination relate to the following questions:
- Does
O
contain a term for each of the expressionsej
I want to use? - Or can I build a legal expression using building blocks from
O
for each of the expressionsej
I want to use? - Does
O
capture all the relevant relations between the expressionsej
,ek
; e.g., thatej
is a specialisation ofek
?
If we answer the first and third question with yes, then we can say that O
is pre-coordinated. If we answer the second and third question with yes, then O
can be post-coordinated, and the degree to which it is depends on the number of expressions ej
for which O
does not contain a single term, but requires the construction of a suitable expression. Finally, O
can be both pre-coordinated and post-coordinated: e.g., O
may have a term for large hydrophilic amino acid, but may also be able to handle the expression Hydrophilic amino acid and Large amino acid
. This can be illustrated with some examples using the Amino Acids Ontology. First of all, we can simply name a class of amino acid called Lysine
.
Class: Lysine SubClassOf: AminoAcid |
Lysine
is a named class, and we have already stated how it relates to AminoAcid
: it is a specialisation of it. We can further co-ordinate Lysine
with other classes in the ontology to describe Lysine
in terms of those classes, and we can do so for all 20 amino acids; descriptions of each of these can be composed from charge (positive, neutral or negative), polarity (polar and non-polar), size (tiny, small, medium and large) and hydrophobicity (hydrophilic or hydrophobic); here is this description for Lysine
:
Class: Lysine SubClassOf: AminoAcid, hasHydrophobicity some Hydrophilic, hasSideChainStructure some Aliphatic, hasCharge some Positive, hasSize some Large, hasPolarity some Polar |
We can also introduce terms for other classes of amino acids, e.g.:
Class: 'Positive amino acid' EquivalentTo: AminoAcid and hasCharge some Positive Class: 'Hydrophilic amino acid' EquivalentTo: AminoAcid and hasHydrophobicity some Hydrophilic |
Using the resulting ontology, I have two choices:
- I can use a reasoner to determine the (so far implicit) relationships between the terms introduced in it; e.g., it will determine that
Lysine
is a specialisation ofPositive amino acid
. And I can then choose to add these relationships explicitly to my ontology, which would make it more pre-coordinated, but also possibly harder to maintain: if I find an error, say, in the definition ofHydrophilic amino acid
, I will have to fix this error as well as the inferences I have drawn and materialised from this error. Alternatively, I can leave these relationships implicit, which will make fixing errors easier, but will require a reasoner to determine these relationships. - I can restrict myself to using terms specified in the ontology, i.e., named classes such as
Lysine
orPositive amino acid
, or I can build expressions from these, e.g.,Positive amino acid and Large amino acid
orPositive amino acid and hasHydrophobicity some Hydrophilic
. In the latter case, we can say that I use the ontology in a post-coordinated way, and this would of course require the use of a reasoner to determine the relationship between the expressions used and the classes defined in my ontology.
In this sense, we should rather speak about using an ontology in a pre-/post-coordinated way, and note that using an ontology in a post-coordinated way requires a reasoner (or similar tool) to determine the relation between the freshly made up expressions and those specified in the ontology. Similarly, we can say that an annotation tool supports post-coordination if it allows annotations in the form of expressions, and is able to determine the relationships between these expressions.
Why is it so groovy?
If we know that we are going to use an ontology in a post-coordinated way, then we know that we don’t have to introduce terms/named classes for each expression that we ever want to use – we can make them up given the base vocabulary from the ontology. As a consequence, we can build our ontology
- with fewer class names: we may choose to define a class name for
Positive amino acid
because it’s a commonly used term, but we may also choose to not give a name toLarge Positive amino acid
,Large Positive Hydrophilic amino acid
, … - with a clear structure: its dimensions reflect the application area’s dimension and can be used to compose relevant terms
- without a combinatorial explosion of terms introduced: consider, e.g., an ontology of diseases with dimension location (in some bodypart), cause (accident, infection, genetic,…), status (chronic, acute,…), etc., and imagine we had to introduce a term for each possible combination. In contrast, using an ontology in a post-coordinated way, we can introduce some prominent names, e.g., congenital heart disease, but leave others to post-coordination, e.g., fracture to the fibula caused by an accident involving a bicycle.
So, in a nutshell, one gets classes on demand, together with their relationships. The ontology provides the building blocks and a class that gives a description is made when it’s needed. Of course, one can judge when it is worth naming a class and putting it in the ontology – when it’s frequently used or used as a part of another expression etc etc. By not putting all possible classes into an ontology one saves space, clutter, increases comprehensiblity etc.
Why is it hard?
Of course, it can’t all be positive: If an ontology is used in a post-coordinated way,
- we have to make more decisions: which classes do we name (or for which expressions do we introduce terms)?
- we need to use a reasoner to determine the relationship between two class expressions (or a class expression and class names). This can be a bit tricky to set up (though the OWL API should help) and may cause worries regarding performance (but tremendous progress has recently been made w.r.t. reasoner performance).
- we may want a single, unique ID for each term: e.g., if I have defined a class
Positive amino acid
in my ontology, and then use this ontology in a post-coordinated way, I can of course useAminoAcid and hasCharge some Positive
in my annotation. The reasoner will determine that they are equivalent – but the annotation looks different, so I may have to be more careful about dealing with these annotations.
That’s all we can think of.
The last word
The act of composing one class with others and then linking it to other classes is coordination. This can be done exclusively at the time of building the ontology, which we can then use in a pre-coordinated way. Alternatively, an OWL ontology can be used in some software, e.g., to deal with document annotations, together with an automated reasoner. This means that classes can be composed or coordinated on the fly, with the reasoner placing the newly minted class in the appropriate place in the ontology’s hierarchy. In this case, this is post-coordination.
Chris Mungall
April 25, 2013 @ 1:53 pm
Nice article, I wish this had been written a few years ago, it would have saved me a lot of time to have been able to refer to it!
A few additional comments:
Post-coordination can be hard without the appropriate tooling. There is a spectrum of possibilities – at one extreme, users could use Protege to write arbitrarily complex nested class expressions. On the other extreme, domain-specific highly constrained template-based forms might be used.
The less constraints on the expressions, the more possibility there is for inter-annotator inconsitency. Sometimes a reasoner can be used to determine equivalence, but sometimes the underlying models will be different.
For example, annotator 1 may say
(part_of some lung) and (part_of some epithelium)
Whereas annotator 2 may say
(part_of some (epithelium and (part_of some lung))
There may be models where these are not equivalent, but training in OWL is required to understand why this is the case.
I have also seen cases where people just write garbage class expressions, misplace parentheses, use inappropriate relations — anything that can go wrong will go wrong. Some of this can be caught with reasoning, but not all. With pre-coordination you have more of a bottleneck, but there are advantages to having trained ontologists check the class expressions correspond to what is intended.
Exchange of data involving post-coordination between two systems requires that both systems speak some subset of OWL, or there is some mapping defined in advance between an exchange format and the form of class expressions used. From a CS perspective this is not particularly hard, from a practical point of view it is. Storing arbitrary class expressions in a queryable way in a relational database system is a pain.
There is also the issue of downstream tooling. Tool developers need training in OWL if they are to make effective use of data with post-coordination, whether it is to analyze the data or build a web interface for users to browse the data. Alternatively, any post-coordinated dataset can be automatically transformed into a pre-coordinated one by materializing all class expressions. But there are all kinds of usability questions here. What label generation rules are used in the materialization? Should least common subsumer grouping classes be automatically created too, to avoid the resulting lattice being highly counter-intuitive to browse (remember: an OWL reasoner will only give you named superclasses and subclasses)? How will this materialization affect the statistics in analyses that use the data (e.g. term enrichment)?
We are now using a deliberately constrained form of post-coordination in GO (no nesting – inner class expressions must be materialized):
http://www.slideshare.net/cmungall/go-annotationextensionsbio-curation2013
This is combined with an “almost post-coordination” approach called TermGenie, where annotators can build classes using pre-defined constrained templates, and have them placed into the ontology automatically using Elk, and later vetted by the ontology team. This has the benefits of both. But we still avoid pre-coordinating classes with trivial differentiating characteristics, opting for post-coordination. http://termgenie.org
I don’t think any of this is massively original – the seeds were sewn when you and Chris Wroe gave a presentation at a GO meeting quite a number of years ago. You were just ahead of the time. It’s only recently, with fast reasononers such as Elk, and the development of new tools and standards that it’s been possible to really effectively mix the two approaches.
Thanks for the article!