Reference and Application Ontologies
James Malone and Helen Parkinson
European Bioinformatics Institute, Cambridge, CB10 1SD, UK
An application ontology is an ontology engineered for a specific use or application focus and whose scope is specified through testable use cases. The application ontology will often use or reference canonical ontologies to construct ontological classes and relationships between classes. Application ontologies are used when modeling cross-domain experiments in biology, for data annotation or visualization and for producing data driven views across reference ontologies for specific user groups.
Helen Parkinson is a geneticist who was seduced to the dark side (Bioinformatics) 10 years ago. She manages and annotates high throughput functional genomics data for the ArrayExpress database and Atlas of Gene Expression hosted at The European Bioinformatics Institute. She also builds ontologies such as EFO and OBI to annotate these data.
James Malone is a knowledge engineer and computer scientist who builds ontologies and triple stores at the EBI. He is a Newcastle United supporter and therefore often disappointed.
There are many reference or ‘canonical’ ontologies in biomedicine. Organizations such as the OBO Foundry aim to organise these reference ontologies into a collection of non-overlapping or ‘orthogonal’ and interoperable resources. There are challenges in integrating, building and consuming reference ontologies. Current reference ontologies are not fully interoperable as they are constructed in different styles, using different tools and often do not share a common upper level ontololgy.
Consequently the import of all or part of most reference ontologies into a single resource is not practical or feasible. Furthermore, importing and combining large ontologies like FMA produces very large ontologies which cause scaling problems when performing reasoning using description logics. There is also an issue of coverage; reference ontologies do not necessarily contain sufficient combinations of classes (e.g. intersections or unions) to represent experimental data. For example information about a cell line includes a cell type and tissue from which it derives, and information about the individual from which tissue was obtained.
Motivation for developing Application Ontologies
Application ontologies are typical used when crossing domains e.g. transcriptomics and genomics, or combining annotation on the sample, gene and experiment dimensions. Let’s consider a gene expression use case: we’d like to make statements about experimental processes, assays, cell types, cell lines, diseases and chemical compounds used to treat cell lines which are experimental models for disease. Performing queries using all these concepts requires that reference ontologies are fully integrated. An application ontology resolves these issues by importing all or parts of reference ontologies that are required to support the application use cases and by integrating along a common axis. The common axis may be an upper level ontology or via a structure that best represents the needs of the application e.g. driven by the data.
Application ontologies can also offer alternative ‘views’ on the reference ontologies by producing specific user or domain-oriented definitions for ontology classes. This may involve producing a definition that a particular community will relate to (given the application focus) (e.g. ‘normalization’ may have several meanings depending upon the context and application focus) or rendering class labels for a specific user community.
An application ontology should be evaluated against a set of use cases and competenecy questions which represent the scope and requirements of the particular application. For example, a user query use case may contain the competency question ‘what cancer cell line data is there’. This requires sufficient ontological coverage to capture the concept of ‘cancer cell line’.
Examples of Application Ontologies
The EBI’s Experimental Factor Ontology is used to represent sample variables from gene expression experimental data. EFO imports classes from multiple reference ontologies and produces new classes which add additional knowledge to reference ontology classes in order to meet querying and curation use cases.
The NeuroInformatics Framework – NIF (NIF), formerly known as BIRN, have produced the NIFSTD ontology. NIF is ‘A dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via any computer connected to the Internet’. NIF has two application resources:
1. NIFSTD an ontology with separate modules covering major domains of neuroscience: anatomy, cell, subcellular, molecule, function and dysfunction.
2. NeuroLex has detailed concepts for describing experimental techniques and instruments typically employed to carry out neuroscientific studies, as well as concepts for describing digital resources being created throughout the neuroscience community.
Both NIFSTD and NeuroLex are non-orthogonal to OBO foundry ontologies and contain cross references to e.g. FMA terms, adding local terms when needed.
Application ontologies are used to meet specific use cases and consume reference ontologies. They have some drawbacks which must be managed if they are to be used successfully.
1. Scaling can be an issue, terms need to be imported and ontologies can become large quickly.
2. Ontologies change rapidly, therefore importing classes without checking if these are still current can mean inbuilt obsolesence. Agent technology can be used to manage this.
This paper is an open access work distributed under the terms of the Creative Commons Attribution License 2.5 (http://creativecommons.org/licenses/by/2.5/), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are attributed.
The paper and its publication environment form part of the work of the Ontogenesis Network, supported by EPSRC grant EP/E021352/1.