Automatic maintenance of multiple inheritance ontologies
Ontologies with multiple inheritance are difficult to maintain manually. However, providing the correct set of axioms, an automated reasoner can be used to maintain such ontologies. The effort is considerable requiring a richer axiomatisation but worthwhile as the automated reasoner is able to maintain the whole structure, avoiding human errors. The more expressive axiomatisation also enables richer queries and other advantages.
Multiple inheritance ontologies
In a multiple inheritance ontology, there are classes with more than one superclass, forming a “polyhierarchy”. For example, in the Cell Type ontology, a cell can be a subclass of several cell types at the same time: phagocyte is a defensive cell, motile cell, stuff accumulating cell, and an animal cell.
The manual maintenance of such structure requires the ontologist to assert all the necessary subsumptions (class-superclass relations). The difficulty of manually maintaining polyhierarchies results from the fact that, for example, when adding a new class, all the appropriate subsumptions must be added, and it is likely that the ontologist will miss some. Another problem with a manually maintained polyhierarchy is the fact that the asserted subsumptions are completely opaque to the reasoner; the reasoner does not “know” why such subsumptions have been asserted.
What is Normalisation?
Normalisation is an ontology building technique that relies on using an automated reasoner (e.g. Pellet) to maintain the polyhierarchy, instead of doing it manually. Thus, the reasoner infers all the necessary subsumptions from the class descriptions, building an inferred polyhierarchy, instead of a manually asserted one. However, adequate and precise class descriptions are needed in order for the reasoner to be able to infer the wanted polyhierarchy.
Languages such as OWL provide the necessary expressivity to write class expressions that are rich enough for the reasoner to infer the polyhierarchy: universal restriction (only), existential restriction (some), number restriction (min, max, exactly), boolean operators (or, and, not), etc. Such constructs can be combined to build rich expressions like part_of some (nucleus and (has_function only photosynthesis)) (part of at least one thing that if it has a function it must be photosynthesis). More importantly from the perspective of Normalisation, defined or primitive classes can be created using OWL. A defined class has at least one necessary and sufficient condition (e.g. nucleus equivalentTo has_part some nucleolus): that is, having a nucleolus as part is enough to infer that and organelle is a nucleus (nucleus is the only organelle with nucleolus as part). A primitive class has only necessary conditions (e.g. nucleus subClassOf part_of some cell): that is, all the nuclei are part of a cell, but other organelles are also part of a cell, so if we find an entity that is part of a cell we cannot infer that it is a nucleus.
In order to use OWL’s capabilities, a normalised ontology should be divided in two parts: the primitive axis and the modules. The primitive axis is formed by primitive classes (yellow ovals), pair-wise disjoint, and with only one superclass. The primitive axis has several levels, and contains the bulk of the classes. The modules are classes with no superclasses (apart from owl:Thing or root class), not disjoint, and defined (brown ovals).
When reasoning is performed, the reasoner will infer that each module has several subclasses from the primitive axis, creating a polyhierarchy. The key for such inference is the fact that each class from the primitive axis has several necessary conditions, and each of such conditions is also present in one of the modules. When adding a new class, the maintainer adds conditions to it that, when inference is performed, will lead to the addition of the needed subsumptions by the reasoner, instead of adding such subsumptions manually.
There are ontologies in which Normalisation fits better. For example, the Cell Type Ontology (CL) presents a polyhierarchy where the Normalisation structure can be neatly applied, as the classification of cells according to different criteria (ploidy, function, development stage, lineage, nucleation, etc.) can be codified as modules: i.e., in a Normalised CL (A version of CL built using Normalisation) there would be a module Haploid Cell (equivalentTo has_ploidy some haploid) that would be inferred as superclass of all the haploid cells (primitive classes with the condition subClassOf has_ploidy some haploid; e.g. ovum, spermatozoon, etc.).
Why use Normalisation?
The use of Normalisation has several advantages. The main advantage is the maintenance process; the reasoner infers all the entailed subsumptions, without missing any. That is especially important in big ontologies like the Gene Ontology, as demonstrated in the GONG project, or in ontologies with a high subsumption per class ratio.
In a Normalised ontology, there is a set of agreed object properties, and, when adding a new class, the ontologist need only explore such object properties and add the due restrictions to the new class. The process resembles the description of an object by filling a form. Therefore the modelling is principled as every developer “fills the same form”. The principled modelling allows to split the work between many developers. This modelling process also results in a modular ontology, as to extend the ontology with a new module, it is only necessary to add a new defined class. Defined classes, or modules, can be regarded as different “views” upon the same collection of objects (e.g. cell by function, cell by organism, cell by ploidy, cell by nuclear number, etc.).
To enable the inference of the polyhierarchy by the reasoner, many axioms need to be added. Such a rich axiomisation is beneficial because it makes the modelling explicit; the reasoner and other users know why a class is a subclass of another class, as this relation is the result of both having a common condition, rather than the manual assertion of the subsumption relationship. For example, if we assert that leptomemingeal cell is a secretory cell, other users and most importantly the reasoner do not know why it is a secretory cell (a biologist may deduce the reason from the term names, but term names are completely useless for reasoners). However, if we assert that leptomemingeal cell has_function some ECM_secretion, it is clear why it has been classified as a subclass of secretory cell (which is equivalentTo has_function some secretion, and ECM secretion is a subClassOf secretion).
Having a richer axiomisation allows to execute more complex queries against the ontology. Such axiomisation also makes automatic debugging possible (e.g. by using explanations). Having explicit wrong axioms is preferable to having implicit wrong ideas, as the reasoner will suggest a possible path for a solution.
Reasoning can be used to maintain ontologies in different ways. One of such ways is to use Normalisation, an ontology building technique that enables the automatic maintenance of polyhierarchies. Normalisation requires the addition of precise axioms for the reasoner to infer the correct subsumptions. It could be argued that Normalisation requires the same, or even more, work than the manual maintenance. However, using normalisation, the same amount of work yields several advantages (automatic maintenance and rich axiomisation), and, in the long term, manual maintenance requires more work, e.g. to fix missing subsumptions.
Normalisation is an Ontology Design Pattern (ODP), thus a best practice for building efficient and rich bio-ontologies. There are different repositories of ODPs: http://odps.sf.net/, http://ontologydesignpatterns.org.
Some OWL tutorials show how to use Normalisation in practical terms.