owl – Ontogenesis http://ontogenesis.knowledgeblog.org An Ontology Tutorial Wed, 12 Jun 2013 21:08:12 +0000 en-US hourly 1 https://wordpress.org/?v=5.2 Common reasons for ontology inconsistency http://ontogenesis.knowledgeblog.org/1343 http://ontogenesis.knowledgeblog.org/1343#respond Wed, 12 Jun 2013 20:18:05 +0000 http://ontogenesis.knowledgeblog.org/?p=1343

Summary

Following on from the previous Ontogenesis article “(I can’t get no) satisfiability” [1], this post explores common reasons for the inconsistency of an ontology. Inconsistency is a severe error which implies that none of the classes in the ontology can have instances (OWL individuals), and (under standard semantics) no useful knowledge can be inferred from the ontology.

Introduction

In the previous Ontogenesis article “(I can’t get no) satisfiability” [1], the authors discussed the notions of “unsatisfiability”, “incoherence”, and “inconsistency”. We recall that a class is “unsatisfiable” if there is a contradiction in the ontology that implies that the class cannot have any instances (OWL individuals); an ontology is “incoherent” if it contains at least one unsatisfiable class. If the ontology is “inconsistent” it is impossible to interpret the axioms in the ontology such that there is at least one class which has an instance; we say that “every class is interpreted as the empty set”.

While incoherent OWL ontologies can be (and are) published and used in applications, inconsistency is generally regarded as a severe error: most OWL reasoners cannot infer any useful information from an inconsistent ontology. When faced with an inconsistent ontology, they simply report that the ontology is inconsistent and then abort the classification process, as shown in the Protégé screenshot below. Thus, when building an OWL ontology, inconsistency (and some of the typical patterns that often lead to inconsistency) needs to be avoided.

Protege Screenshot

In what follows, we will outline and explain common reasons for the inconsistency of an OWL ontology which we separate into errors caused by axioms on the class level (TBox), on the instance level (ABox), and by a combination of class- and instance-related axioms. Note that the examples are simplified versions which represent, in as few axioms as possible, the effects multiple axioms in combination can have on an ontology.

Instantiating an unsatisfiable class (TBox + ABox)

Instantiating an unsatisfiable class is commonly regarded as the most typical cause of inconsistency. The pattern is fairly simple – we assign the type of an unsatisfiable class to an individual:

Individual: Dora
  Types: MadCow

where MadCow is an unsatisfiable class. The actual reason for the unsatisfiability does not matter; the contradiction here is caused by the fact that we require a class that cannot have any instances (MadCow) to have an instance named Dora. Clearly, there is no ontology in which the individual Dora can fulfil this requirement; we say that the ontology has no model. Therefore, the ontology is inconsistent. This example shows that, while incoherence is not a severe error as such, it can quickly lead to inconsistency, and should therefore be avoided.

Instantiating disjoint classes (TBox + ABox)

Another fairly straightforward cause of inconsistency is the instantiation of two classes which were asserted to be disjoint:

Individual: Dora
  Types: Vegetarian, Carnivore
  DisjointClasses: Vegetarian, Carnivore

What we state here is that the individual Dora is an instance of both the class Vegetarian and the class Carnivore. However, we also say that Vegetarian and Carnivore are disjoint classes, which means that no individual can be both a Vegetarian and a Carnivore. Again, there is no interpretation of the ontology in which the individual Dora can fulfil both requirements; therefore, the ontology has no models and we call it inconsistent.

Conflicting assertions (ABox)

This error pattern is very similar to the previous one, but all assertions now happen in the ABox, that is, on the instance level of the ontology:

Individual: Dora
  Types: Vegetarian, not Vegetarian

Here, the contradiction is quite obvious: we require the individual Dora to be a member of the class Vegetarian and at the same time to not be a member of Vegetarian.

Conflicting axioms with nominals (all TBox)

Nominals (oneOf in OWL lingo) allow the use of individuals in TBox statements about classes; this merging of individuals and classes can lead to inconsistency. The following example, based on an example in [2], is slightly more complex than the previous ones:

Class: MyFavouriteCow
  EquivalentTo: {Dora}
Class: AllMyCows
  EquivalentTo: {Dora, Daisy, Patty}
  DisjointClasses: MyFavouriteCow, AllMyCows

The first axiom in this example requires that every instance in the class MyFavouriteCow must be equivalent to the individual Dora. In a similar way, the second axiom states that any instance of AllMyCows must be one of the individuals Dora, Daisy, or Patty. However, we then go on to say that MyFavouriteCow and AllMyCows are disjoint; that is, no member of the class  MyFavouriteCow can be a member of AllMyCows. Since we already stated that Dora is a member of both MyFavouriteCow and AllMyCows, the final disjointness axiom causes a contradiction which means there cannot be any interpretation of the axioms that fulfils all three requirements. Therefore, the ontology is inconsistent.

No instantiation possible (all TBox)

The following examples demonstrates an error which may not occur in a single axiom as it is shown here (simply because it is unlikely that a user would write down a statement which is obviously conflicted), but could be the result of several axioms which, when taken together, have the same effect as the axiom below. It is also non-trivial to express the axiom in Manchester syntax (the OWL syntax chosen for these examples) since it contains a General Concept Inclusion (GCI)[3], so we will bend the syntax slightly to illustrate the point.

Vegetarian or not Vegetarian
  SubClassOf: Cow and not Cow

Let’s unravel this axiom. First, in order for any individual satisfy the left-hand side of the axiom, it has to be either a member of Vegetarian or not a member of Vegetarian. Clearly, since either something is a member of a class or it is not (there are no values “in between”), the statement holds for all individuals in the ontology. The right-hand side (or, second line) of the axiom then requires all individuals to be a member of the class Cow and not Cow at the same time; again, this falls into the same category as the examples above, which means that no individual can meet this requirement. Due to this contradiction, there is no way to interpret the axiom to satisfy it, which renders the ontology inconsistent.

Conclusion

In this post, we have discussed some of the most common reasons for inconsistency of an OWL ontology by showing – simplified – examples of the error patterns. While some of these – such as instantiation of an unsatisfiable class – can be identified fairly easily, others – such as conflicting axioms involving nominals – can be more subtle.

References

  1. U. Sattler, R. Stevens, and P. Lord, "(I can’t get no) satisfiability", Ontogenesis, 2013. http://ontogenesis.knowledgeblog.org/1329
  2. B. Parsia, E. Sirin, and A. Kalyanpur, "Debugging OWL ontologies", Proceedings of the 14th international conference on World Wide Web - WWW '05, 2005. http://dx.doi.org/10.1145/1060745.1060837
  3. U. Sattler, and R. Stevens, "Being complex on the left-hand-side: General Concept Inclusions", Ontogenesis, 2012. http://ontogenesis.knowledgeblog.org/1288
]]>
http://ontogenesis.knowledgeblog.org/1343/feed 0
OWL Syntaxes http://ontogenesis.knowledgeblog.org/88 http://ontogenesis.knowledgeblog.org/88#comments Fri, 22 Jan 2010 10:08:33 +0000 http://ontogenesis.knowledgeblog.org/?p=88

There are a variety of syntaxes for persisting, sharing and editing OWL ontologies. These syntaxes range from the officially recommended RDF/XML exchange syntax, that any OWL compliant tool must support, through de facto standard syntaxes that virtually all OWL tools and APIs support, to bespoke syntaxes that are designed for particular purposes and applications. What ever syntax is used, it is important to realise that the OWL language is not defined using a particular concrete syntax, but is defined in a high level structural specification which is then mapped into various concrete syntaxes. Although the OWL World Wide Web Consortium (W3C) Recommendation specifies RDF/XML as the default exchange syntax there are a variety of alternative syntaxes with tools support that can be used for concrete representations of ontologies. Some of these syntaxes are specified in W3C notes, which means that an OWL compliant implementation need not support them, however, in practice the main APIs and editing tools do support them. In what follows some of the most widely used syntaxes are discussed. We begin by looking at how OWL is defined and then by looking at how the definition is mapped into concrete syntaxes.

From the OWL Structural Specification to OWL Syntaxes

The OWL 2 Structural Specification describes what constitutes an ontology from a structural point of view, for example, an ontology can be named with an IRI and is a set of axioms. It describes the various types of axioms and the types of class descriptions and entities that make up these axioms. It does this in a high level way which does not commit to any particular concrete representation syntax. This makes it possible to clearly describe the essential features of the OWL language without getting bogged down in the technical details of exchange syntaxes.

The Functional OWL Syntax

The first step towards concrete syntaxes is the functional syntax, which is a simple text base syntax that is used as a bridge between the structural specification and various concrete syntaxes. The example below shows an equivalent classes axiom, which specifies that Teenager is equivalent to a person whose ages is between 12 and 20. Generally speaking, this syntax is not intended to be used as a primary exchange syntax, but is simply used for translating the structural specification into other concrete syntaxes.

EquivalentClasses(:Teenager 
    ObjectIntersectionOf(:Person
        DataSomeValuesFrom(:hasAge 
        DatatypeRestriction(xsd:integer 
            xsd:maxExclusive "20"^^xsd:integer 
            xsd:minExclusive "12"^^xsd:integer))))

RDF Based Syntaxes

The primary syntax that all OWL compliant tools must support is RDF/XML. As the name suggests, this syntax provides an XML representation of an RDF graph. The OWL specification therefore provides a bidirectional mapping from the OWL Functional Syntax to RDF Graphs, which can then be serialised into any concrete RDF representations such as RDF/XML and Turtle.

RDF/XML

The example below shows the definition of Teenager in RDF/XML. This is the same definition as shown above. As can be seen, RDF/XML is a very verbose syntax that is difficult to read. Although some people enjoy reading RDF/XML most people would not choose to edit this syntax by hand. Most OWL tools use this syntax as the default syntax for saving ontologies.

<owl:Class rdf:about="http://www.semanticweb.org/ontologies/ontogenesis/example#Teenager">
    <owl:equivalentClass>
        <owl:Class>
            <owl:intersectionOf rdf:parseType="Collection">
                <rdf:Description rdf:about="http://www.semanticweb.org/ontologies/ontogenesis/example#Person"/>
                <owl:Restriction>
                    <owl:onProperty rdf:resource="http://www.semanticweb.org/ontologies/ontogenesis/example#hasAge"/>
                    <owl:someValuesFrom>
                        <rdfs:Datatype>
                            <owl:onDatatype rdf:resource="http://www.w3.org/2001/XMLSchema#integer"/>
                            <owl:withRestrictions rdf:parseType="Collection">
                                <rdf:Description>
                                    <xsd:maxExclusive rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">20</xsd:maxExclusive>
                                </rdf:Description>
                                <rdf:Description>
                                    <xsd:minExclusive rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12</xsd:minExclusive>
                                </rdf:Description>
                            </owl:withRestrictions>
                        </rdfs:Datatype>
                    </owl:someValuesFrom>
                </owl:Restriction>
            </owl:intersectionOf>
        </owl:Class>
    </owl:equivalentClass>
</owl:Class>

The advantages of RDF/XML are: Any tool claiming to be an OWL tool must be able to consume and produce it, which means that it is widely supported. RDF based syntaxes (including non-XML syntaxes) are good for representing the assertional axioms (ABox, data) in OWL ontologies. This is because these axioms correspond directly to triples.

The disadvantages of RDF/XML are: It is very verbose. It can be difficult to read. A feature of RDF/XML (and other RDF based syntaxes) is that because the syntax is triple based, the translation from complex class expressions, annotations and various OWL axioms into triples can require “reification” and the resulting RDF can be very verbose and ugly. The design of the OWL/RDF mapping (inherited from OWL 1) means that it is non-trivial to parse OWL ontologies represented in RDF. In fact, parsing almost always requires two passes which means that much more memory is required for parsing an OWL/RDF document that is required to hold the actual ontology in memory.

Turtle

Another RDF concrete syntax is Turtle. This syntax is slighly less verbose and slightly more readable than RDF/XML. However, as can be seen from the representation of Teenager below, it is arguable that RDF based syntaxes do not really provide a convenient human readable format for representing complex OWL class expressions and OWL axioms.

:Teenager rdf:type owl:Class ;
    owl:equivalentClass [ rdf:type owl:Class ;
        owl:intersectionOf ( : Person
            [ rdf:type owl:Restriction ;
              owl:onProperty :hasAge ;
              owl:someValuesFrom [ rdf:type rdfs:Datatype ;
                  owl:onDatatype xsd:integer ;
                  owl:withRestrictions ( 
                      [ xsd:maxExclusive "20"^^xsd:integer ]
                      [ xsd:minExclusive "12"^^xsd:integer]
                      )
                  ]
              ]
           )
        ] .

The advantages of Turtle are: It is much more concise and human readable than RDF/XML. It is quite widely supported – widely used tools and APIs such and Jena and the OWL API can produce and consume Turtle.

The disadvantages of Turtle are: Similar to the disadvantages of any RDF based syntax as described above.

OWL/XML

Although it is possible to represent an OWL ontology in RDF/XML, the design of RDF/XML means that it is difficult or impossible to use off the shelf XML tools for tasks other than parsing and rendering it. Neither standard XML tools like XPath or XSLT work well with RDF/XML representations of ontologies. Because of this, and because of the desire for a more regular and simple XML format, OWL/XML was invented as a concrete representation format for OWL ontologies. The format is essentially derived directly from the Functional Syntax. The example below shows the description of Teenager. Compare this example with the irregular RDF/XML example above in order to note the regularity of OWL/XML.

<EquivalentClasses>
    <Class IRI="#Teenager"/>
    <ObjectIntersectionOf>
        <Class IRI="#Person"/>
        <DataSomeValuesFrom>
            <DataProperty IRI="#hasAge"/>
            <DatatypeRestriction>
                <Datatype abbreviatedIRI="xsd:integer"/>
                <FacetRestriction facet="http://www.w3.org/2001/XMLSchema#maxExclusive">
                    <Literal datatypeIRI="http://www.w3.org/2001/XMLSchema#integer">20</Literal>
                </FacetRestriction>
                <FacetRestriction facet="http://www.w3.org/2001/XMLSchema#minExclusive">
                    <Literal datatypeIRI="http://www.w3.org/2001/XMLSchema#integer">12</Literal>
                </FacetRestriction>
            </DatatypeRestriction>
        </DataSomeValuesFrom>
    </ObjectIntersectionOf>
</EquivalentClasses>

The advantages of OWL/XML are: It conforms to an XML Schema and is regular so it is possible to use off the shelf tools for processing and querying it such as XPath and XSLT. It is easy to write and parse. Parsing can be done in a streaming mode, in one pass, using a SAX parser.

The disadvantages of OWL/XML are: It is very verbose. File sizes can be very large and this can make parsing slow.

The Manchester OWL Syntax

The Manchester OWL Syntax provides a compact textual based representation for OWL ontologies that is easy to read and write. The primary motivation for the design of the Manchester OWL Syntax was to produce a syntax that could be used for editing class expressions in tools such as Protege. It has since been extended so that it is possible to represent complete ontologies, and is now specified in a W3C note. The example below shows the definition of Teenager. Notice how much more compact and readable this syntax is. The Manchester Syntax is used throughout many ontology development environments such as Protege 3, Protege 4, and Top Braid Composer for presenting and editing components of axioms such as class expressions.

Class: Teenager
    EquivalentTo: Person and (hasAge some integer[> 12 , < 20])

The advantages of Manchester Syntax are: It is very compact, it is easy to read and write.

The disadvantages of Manchester Syntax are: It is cumbersome to represent some axioms in OWL such as general subclass axioms, which have class expressions for their left hand side.

Tools

The OWL API is a de facto standard API for creating, manipulating and serialising OWL ontologies. It has parsing and rendering support for all of the concrete syntaxes described above. A web based syntax converter is available at http://owl.cs.manchester.ac.uk/converter/

Jena is an API for dealing with RDF and OWL. It has a large user base and has good support.

For more information see the OWL 2 Overview.

Acknowledgements

This paper is an open access work distributed under the terms of the Creative Commons Attribution License 3.0 (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are attributed.

The paper and its publication environment form part of the work of the Ontogenesis Network, supported by EPSRC grant EP/E021352/1.

]]>
http://ontogenesis.knowledgeblog.org/88/feed 5
Protégé & Protégé-OWL http://ontogenesis.knowledgeblog.org/52 http://ontogenesis.knowledgeblog.org/52#comments Thu, 21 Jan 2010 17:04:35 +0000 http://ontogenesis.knowledgeblog.org/?p=52

What is Protégé

Protégé-OWL has become the default open-source editor for the Web ology Language, OWL (See also ).  It was developed collaboratively between University of Manchester and Stanford University based on the earlier Protégé that used the frame-based formalism based on OKBC.

Protégé is an open-source environment that provids a framework for “plugins” either for specific applications or alternative editing tools and views.

Versions of Protégé-OWL

Protégé-OWL currently exists in two released versions – version 3 and version 4 – plus an experimental Web/browser based version using Google Web Toolkit (GWT).

Version 3 is limited to the original OWL 1.0 specification but supports multi-user access and a number of plugins not yet ported to version4 , most importantly the SWRL plugin and the PROMPT ontology comparison tools.  Version 3 is layered over the original frames environment and the original serialisation of OWL in RDF/XML.

Version 4 is a complete re-write including the full OWL 2 specification and built on the new OWL 2 API and a range of plugins for easy creation of OWL ontologies – e.g. outline and spreadsheet like functionality, better search facilities, and tighter integration with OWL reasoners, etc.  Multi-user and web versions of Protege-OWL-4 are under development but not yet released (as of Jan 2010), as are versions of the most popular plugins for version 3.

The lists of available plugins for Proteégé-OWL-4 can be found on Manchester’s CO-ODE site.

The OWL tutorial

A key resource for understanding both OWL and the use of Protégé-OWL is the  Protégé OWL Tutorial which is also probably the best overall introduction to OWL for general users.

Manchester Syntax

Neither the official OWL abstract syntax nor the OWL XML syntax is easy to read or use.  Protégé-OWL introduced a new simplified and less verbose syntax: e.g. “some” and “only” rather than “someValuesFrom” and “allValuesFrom”.  The Manchester Syntax has now been accepted by W3C as part of the official Official OWL 2 standard.

(See OWL Syntaxes)

]]>
http://ontogenesis.knowledgeblog.org/52/feed 5
OWL, an ontology language http://ontogenesis.knowledgeblog.org/55 http://ontogenesis.knowledgeblog.org/55#comments Thu, 21 Jan 2010 15:33:16 +0000 http://ontogenesis.knowledgeblog.org/?p=55

This article takes the reader on an introductory tour of OWL, with particular attention on the meaning of OWL statements, their entailments, and what reasoners do. Related Knowledge Blog posts include one on ontology components, one on OWL syntaxes, and one on the extent of classes.

There are numerous ontology languages around, most prominently the Web Ontology Language OWL. OWL has been developed based on experiences with its predecessors DAML+OIL and OIL, and its design has been carried out by W3C working groups. OWL 2 is an extension and revision of the OWL (published in 2004) and is a W3C recommendation.

OWL and OWL 2 are called Web Ontology Languages because they are based on web standards  such as XML, IRIs, and RDF, and because they are designed in such a way that they can be used over the web (for example, one OWL file can import others by their URI). There are numerous usages of OWL and OWL 2, however, that are rather local, for example to a software or information system.

These languages come with a lot of options and choices, which we will only briefly mention here, and only come back to when they are important. OWL comes in three flavours (OWL Full, OWL lite, and OWL DL), and OWL 2 comes with two semantics (i.e., two ways of determining the meaning of an ontology, direct  and RDF-based) and three profiles (i.e., fragments or syntactic restrictions, called OWL 2 EL, QL and RL), and you can choose between a number of syntaxes to save your ontology in. Since the tools and especially the reasoners around mostly support OWL 2’s direct semantics and OWL DL, we will concentrate here on those. Also, OWL 2 is backwards compatible to OWL, so we can discuss advantages and new features of OWL 2 elsewhere, and can forget the difference for now and just talk about OWL (and mean both OWL and OWL 2).

Next, we would like to utter a warning: OWL has been designed to be consumed by computers, so in its natural form (especially in certain syntaxes), it is really hard to read or write for humans: e.g., the following snippet of an OWL ontology in the RDF syntax says that

a:Boy owl:equivalentClass _:x .
_:x rdf:type owl:Class .
_:x owl:intersectionOf ( Child Male)

boys are exactly those children who are male. The same example in the Manchester syntax looks more readable,

EquivalentClasses( Boy ObjectIntersectionOf( Child Male ) )

but we can easily imagine a much nicer presentation of this statement, and tool developers have designed useful, goal- or application-oriented tools or visualisations. This is clearly a good thing: it helps the user to interact with an (OWL) ontology, without requiring them to be fluent in the ontology language and while supporting the task at hand.

Now what is in an OWL ontology? There is some stuff like headers and declarations around an ontology but, in essence, an OWL ontology is a set of axioms, and each of these makes a statement that we think is true about our view of the world. An axiom can say something about classes, individuals, and properties. For example, the following axioms (in Manchester syntax) talk about two classes, Man and Person,  and one property, hasChild, and two individuals, Alice and Bob.

SubClassOf( Man Person )

SubClassOf(Person (hasChild only Person))

ClassAssertion(Bob Man)

PropertyAssertion(hasChild Bob Alice)

Roughly speaking, these axioms say something about these classes, properties, and individuals, and this meaning is fixed through their semantics, which allows us to distinguish interpretations/structures/worlds/… that satisfy these axioms from those that don’t. For example, a structure where every Man is a Person would satisfy the first axiom, whereas one where we have a Man who is not a Person would not satisfy the first axiom. Rather confusingly for modelers in general, we call those interpretations/structures/worlds/… that satisfy all axioms of an ontology a model of this ontology. It is worth pointing out that one ontology can have many many models, of varying size and even infinite ones. And here we can even have a sneak preview at reasoning or inferencing: assume the axioms in our ontology are such that in all its models, it happens that every GrandParent is a Parent. Then we call this an entailment or a consequence of our ontology, and we expect a reasoner to find this out and let us know (if you are familiar with Protégé, then you might have seen an inferred class hierarchy, which is basically this).

More detailed, this semantics works as follows: first, fix a set — any set of things will do, finite or infinite, as long as it is not empty. Then, take each class name (such as Man) and interpret it as a set — any set is fine, it can even be empty. Then, take each property name (such as hasChild) and interpret it as a relation on your set (basically by drawing edges between your elements) — again, you are free to choose whatever relation you like. Then, take each individual name (such as Bob) and interpret it as one of your elements. All together, you have now an interpretation (but remember that 1 ontology can have many many interpretations). Now, to check whether your interpretation satisfies your ontology, you can go through your ontology axiom by axiom and check whether your interpretation satisfies each axiom. For example, in order for your interpretation to satisfy

  • the first axiom, SubClassOf( Man Person ), the set that interprets Man has to be a subset of the set that interprets  Person. Since this kind of sentence will soon become horribly contrived, we rather say ‘every instance of Man is also an instance of Person’.
  • the second axiom, SubClassOf(Person (hasChild only Person)), every instance of  Man is related, via the property hasChild, to instances of Person only. I.e., for an instance of Man, if it has an out-going hasChild edge, then this must link it to an instance of Person.
  • the third axiom, ClassAssertion(Bob Man), the element that interprets Bob must be an instance of Man (see, now it becomes quite easy?).
  • the fourth axioms, PropertyAssertion(hasChild Bob Alice), the element that interprets Bob must be related, via the hasChild property, to the element that interprets Alice.

So, in this case, we could in principle, construct or invent interpretations and test whether they satisfy our ontology, i.e., whether it’s a model of it or not. This would, however, hardly enable us to say something about what holds in all models in our ontology because, as mentioned earlier, there can be loads of those, even infinitely many…so we rather leave this to tools called reasoners (and they do this in a more clever way). This whole exercise should, however, help us understand the above mentioned entailment. Consider the following two axioms:

EquivalentClass(Parent (Person and isParentOf some Person))

EquivalentClass(GrandParent (Person and (isParentOf some (isParentOf some Person)))

The first axiom says that the instances of Parent are exactly those elements who are related, via isParentOf, to some instance of Person. The second axiom says that the instances of GrandParent are exactly those elements who are related, via isParentOf, to some element who is related, via isParentOf, to an instance of Person. Please note that the GrandParent axiom does not mention Parent. Now you can try to construct an interpretation that satisfies both axioms and where you have an instance of GrandParent that  is not a Parent…and it will be impossible…then you can think some more and come to the conclusion that these two axioms entail that every GrandParent is a Parent, i.e., that  GrandParent is a sub class of Parent!

Coming back to Protégé: if you look at the inferred class hierarchy in Protege, then you see both the ‘told’ plus these entailed subclass relationships. In OWL, we also have two special classes, thing and nothing, and they are interesting for the following reasons:

  • if thing is a subclass of a user-defined class, say X, then every element in every interpretation is always an instance of X. This is often regarded as problematic, e.g., for reuse reasons.
  • if your class, say Y, is a subclass of nothing, then Y can never have any instance at all, because nothing is according to the OWL specification, always interpreted as the empty set. In many cases, this thus indicates a modelling error and requires some repair.

Finally, we also ask our reasoner to answer a query, e.g. to give us all instances of Person. If you look again at the four axioms above, then we only have that Bob is an instance of Man, so we might be tempted to not return Bob to this query. On the other hand, we also have the axiom that says that every instance of Man is also an instance of Person, so we should return Bob because our ontology entails that Bob is a Person. Reasoners can be used to answer such queries, and they are not restricted to class names: for example, we could also query for all instances of (Person and (hasChild some Person)). Now, from the four axioms we have, we can’t infer that Bob should be returned to this query because, although we know that Bob is a Person and is hasChild related to Alice, we don’t know anything about her, and thus we don’t know whether she is a Person or not. Hence Bob can’t be returned to this query. Similarly, if we query for all instances of (Person and (hasChild atmost 1)), we cannot expect Bob to be in the answer:  although we know that Bob is a Person and is hasChild related to Alice, we don’t know whether he has possibly other children, unbeknownst to us. This kind of behaviour is referred to as OWL’s  open world assumption.

It is quite common to distinguish class-level ontologies (which only have axioms about classes, but don’t mention individuals), from instance-level ontologies (i.e., assertions about the types and relations between individuals). We find ontologies that are purely class-level, such as Snomed-CT and NCIt, and where reasoning is used purely to make sure that the things said about classes and the resulting entailed class hierarchy are correct, and that no contradictory things have been said that would lead to subclasses of nothing or to the whole ontology being contradictory. One interesting option is then, e.g., to export the resulting class hierarchy as a SKOS vocabulary to be used for navigation. We also find ontologies with both class- and instance-level axioms, and which are used with the above query answering mechanism for flexible, powerful mechanism for accessing data.

Finally, if you want to use OWL for your application, you will first have to clarify whether this involves a purely class-level  ontology, or whether you want to use OWL for accessing data. In the latter case, you have two options: you can leave the data in the database, files, or formats that it currently resides in, and use existing approaches (e.g., using Quonto, OWLGres or Requiem)  to map this data to your class-level ontology and thus query it through the OWL ontology. Or you can extract and load it into an instance-level ontology and go from there. Both clearly have advantages and disadvantages, whose discussion goes beyond the scope of this article (as many other aspects).

So, where to go next if you want to learn more about OWL? First, you could download an OWL editor such as Protégé 4, and follow a tutorial on how to build an OWL ontology (see below for more links). You could also read the substantial OWL Primer (it has a cool feature which lets you decide which syntaxes to show and which to hide!) and take it from there. Or you could read some of the papers on experiences with OWL in modelling biology. Regardless of what you do, building your own OWL ontology and asking reasoners to make entailments salient seems always to be a good plan.

Helpful links:

PS: I need to point out that (i) OWL is heavily influence by classical first order predicate logic and by research in description logics (these are fragments of first order logic that have been developed in knowledge representation and reasoning since the late 80ies), and that (ii) OWL is much more than what is mentioned here: e.g., we can annotate axioms and classes, import other ontologies, etc., and in addition to the OWL constructors such as ‘and’, ‘some’, ‘only’, used here, there are numerous others, far too many to be mentioned here.

]]>
http://ontogenesis.knowledgeblog.org/55/feed 5
Semantic Integration in the Life Sciences http://ontogenesis.knowledgeblog.org/126 http://ontogenesis.knowledgeblog.org/126#comments Thu, 21 Jan 2010 15:20:03 +0000 http://ontogenesis.knowledgeblog.org/?p=126

There are a number of limitations in data integration: data sets are often noisy, incomplete, of varying levels of granularity and highly changeable. Every time one of the underlying databases changes, the integrated database needs to be updated, and if there are any format changes, the parsers that convert to the unified format need to be modified as well. This ”database churn” was identified by Stein to be a major limiting factor in establishing a successful data warehouse (Stein 2003).

Ruttenberg et al. see the Semantic Web, of which both OWL and RDF are components, as having the potential to aid translational and systems biology research; indeed, any life science field where there are large amounts of data in distributed, disparate formats should benefit from Semantic Web technologies (Ruttenberg et al. 2007).

Semantic Integration

Integrated data sources, whether distributed or centralised, allow querying of multiple data sources in a single search. Traditional methods of data integration map at least two data models to a single, unified, model. Such methods tend to resolve syntactic differences between models, but do not address possible inconsistencies in the concepts defined in those models. Semantic integration resolves the syntactic heterogeneity present in multiple data models as well as the semantic heterogeneity among similar concepts across those data models. Often, ontologies or other semantic web tools such as RDF are used to perform the integration.

Addressing Semantic Heterogeneity

Semantic heterogeneity describes the difference in meaning of data among different data sources. A high level of semantic heterogeneity makes direct mapping difficult, often requiring further information to ensure a successful mapping. Such heterogeneity is not resolved in more traditional syntactic data integration methods. For instance, in data warehousing or data federation, multiple source schemas (e.g. database schemas) are converted to a single target schema. In data warehousing, the data stored in the source models is copied to the target, while in federated databases the data remains in the source models and is queried remotely via the target schema.

However, the schema reconciliation in non-semantic approaches tends to be hard-coded for the task at hand, and is not easily used for other projects. Often, data is aligned by linking structural units such as XSD components or table and row names. Further, concepts between the source and target schema are often linked based on syntactic similarity, which does not necessarily account for possible differences in the meanings of those concepts. For instance, a protein in BioPAX is strictly defined as having only one polypeptide chain, while a protein in UniProtKB (The UniProt Consortium 2008) can consist of multiple chains. Semantic data integration is intended to resolve both syntactic and semantic heterogeneity and can allow a richer description of domain of interest than is possible with syntactic methods. By using ontologies, kinds of entities, including relations, can be integrated across domains based on their meaning. However, application of such techniques in bioinformatics is difficult, partly due to the bespoke nature of the majority of available tools.

The protein example can be further extended to illustrate the practical differences between traditional data integration and semantic integration. In traditional data integration methods, two database schemas may contain a “Protein” table, but if what the developers mean by “Protein” is different, there is little way of determining this difference programmatically. An integration project using these two schemas as data sources may erroneously mark them as equivalent tables. In semantic integration, if the two data sources had modelled Protein correctly, the differences in their meaning would be clear both programmatically and to a human looking at the axioms for Protein in two data sources’ ontologies. In such cases, once the semantic differences are identified they can then be resolved. One possibility would be the creation—by the person creating the integrated ontology and data set—of a Protein superclass that describes a Protein in a generic way. The two source definitions could then be modelled as children of that Protein superclass.

Ontology-based Integration

Integration methods based on ontologies can be more generic, re-usable and independent of the integrative applications they were created for, when compared with traditional approaches which resolve only syntactic heterogeneity (Cheung et al. 2007). Mappings between schemas in non-semantic approaches are specific to those schemas, and cannot be applied to other data sources; conversely, mappings between ontologies (and therefore to the data sources that utilise those ontologies) can be used by any resource making use of those ontologies, and not just the original, intended, data sources. Two concepts may have different names, but if they reference the same ontology term, then it may be sensible to mark them as semantically equivalent. However, this method brings its own challenges, as described in the Ontogenesis article Ontologies for Sharing, Ontologies for Use:

“The alternative approach of defining equivalences between terms in different ontologies suffers from some of the same problems, since use of owl:EquivalentClass is logically strict. Strict equivalence is inappropriate if the definitions of the classes within the two ontologies differ significantly. . . . . An alternative is just to indicate that some sort of relationship exists between classes between two ontologies by use of skos:related (http://www.w3.org/TR/skos-primer/). “

Ontology mapping, also known as class rewriting, is a well-studied methodology that allows the mapping of a source class to a target class from different ontologies. As primitive classes are used in DL to characterise defined classes (pg. 52, Baader et al. 2003), such rewriting also allows the linking of relationships (also known as properties) between the two ontologies. Mapping can be used to automatically generate queries over the data source ontologies via an core ontology using views over the data source ontologies. Additionally, mapping can be applied more generally to rewrite the required features of data source ontologies as a function of a core ontology, as described in Rousset et al. for two existing data integration systems, PISCEL and Xyleme (Rousset et al. 2004).

In the life sciences, the most common formats for ontologies are OWL and OBO. More complex semantic integration tasks can be performed using greater than two ontologies and often employ a mediator, or core, ontology which is used in concert with more than one format, or source, ontologies.

Mapping Strategies

Often, the data sources to be integrated cover very different domains, and one or even two ontologies are not sufficient to describe all of the sources under study. In such cases, there are a variety of methodologies to map more than two ontologies together. Most ontology integration techniques where more than two ontologies can be classified according to two broad mapping strategies: global-as-view, where the core ontology is created as a view of the source ontologies, and local-as-view, where the reverse is true. Global-as-view mapping defines a core ontology as a function of the syntactic ontologies rather than as a semantically-rich description of the research domain in its own right, though the level of dependence of the core ontology can vary (Wache et al. 2001, Rousset et al. 2004, Gu et al. 2008). With local-as-view, the core ontology is independent of the syntactic ontologies, and the syntactic ontologies themselves are described as views of the core ontology.

Hybrid approaches (Lister et al. 2009, Xu et al. 2004) also generate mappings between source ontologies and the core ontology. However, unlike traditional approaches, the core ontology is completely independent of any of the source ontologies. Such approaches allow both the straightforward addition of new source ontologies as well as the maintenance of the core ontology as an independent entity.

Current Semantic Integration Efforts

RDF databases are generally accessed and queried via SPARQL. Life science RDF databases include the Data Web projects such as OpenFlyData (Miles et al., submitted); Neurocommons (Ruttenberg et al. 2009), BioGateway (Antezana et al. 2009) and S3DB (Deus et al. 2008). Many others are listed in Table 1 of Antezana (Antezana et al. 2009). Some databases only use RDF, while others make use of OWL.

Databases such as RDF triple stores provide data sets in a syntactically similar way, but the semantic heterogeneity is not necessarily resolved. For instance, while Bio2RDF stores millions of RDF triples, queries must still trace a path against existing resources rather than have those resources linked via a shared ontology or ontologies (Belleau et al. 2008). Shared vocabularies (e.g. OBO Foundry ontologies) can be used to build connections between RDF data files, which would provide existing connections among data sets that could be leveraged by integration projects.

Semantic integration projects can make use of expressive logic-based ontologies to aid integration. Work on ontology mapping and other semantic data integration methodologies in the life sciences includes the RDF approaches mentioned above as well as the TAMBIS ontology-based query system (Stevens et al. 2000); mapping the Gene Ontology to UMLS (Lomax et al. 2004); the integration of Entrez Gene/HomoloGene with BioPAX via the EKoM (Sahoo et al. 2008); the database integration system OntoFusion (Alonso-Calvo et al. 2007); the SWRL mappings used in rule-based mediation to annotate systems biology models (Lister et al. 2009); and the pharmacogenomics of depression project (Dumontier and Villanueva-Rosales, 2009).

Even with improved methods in data integration, problems of data churn remain. Some projects, such as that by Zhao et al., have proposed the use of Named Graphs to track provenance and churn of bioinformatics data, such as gene name changes (Zhao et al. 2009). Ultimately, it is not just the syntax and semantics of the data sources which must be resolved, but also the challenges associated with ensuring that data is up to date, complete and correctly traced and labelled.

Acknowledgements

This paper is an open access work distributed under the terms of the Creative Commons Attribution License 3.0 (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are attributed.

The paper and its publication environment form part of the work of the Ontogenesis Network, supported by EPSRC grant EP/E021352/1.

]]>
http://ontogenesis.knowledgeblog.org/126/feed 6
Automatic maintenance of multiple inheritance ontologies http://ontogenesis.knowledgeblog.org/49 http://ontogenesis.knowledgeblog.org/49#comments Thu, 21 Jan 2010 16:12:45 +0000 http://ontogenesis.knowledgeblog.org/?p=49
Mikel Egaña Aranguren <mikel.egana.aranguren@gmail.com>
(Technical University of Madrid, Spain)

Introduction

Ontologies with multiple inheritance are difficult to maintain manually. However, providing the correct set of axioms, an automated reasoner can be used to maintain such ontologies. The effort is considerable requiring a richer axiomatisation but worthwhile as the automated reasoner is able to maintain the whole structure, avoiding human errors. The more expressive axiomatisation also enables richer queries and other advantages.

Multiple inheritance ontologies

In a multiple inheritance ontology, there are classes with more than one superclass, forming a “polyhierarchy”. For example, in the Cell Type ontology, a cell can be a subclass of several cell types at the same time: phagocyte is a defensive cell, motile cell, stuff accumulating cell, and an animal cell.

The manual maintenance of such structure requires the ontologist to assert all the necessary subsumptions (class-superclass relations). The difficulty of manually maintaining polyhierarchies results from the fact that, for example, when adding a new class, all the appropriate subsumptions must be added, and it is likely that the ontologist will miss some. Another problem with a manually maintained polyhierarchy is the fact that the asserted subsumptions are completely opaque to the reasoner; the reasoner does not “know” why such subsumptions have been asserted.

What is Normalisation?

Normalisation is an ontology building technique that relies on using an automated reasoner (e.g. Pellet) to maintain the polyhierarchy, instead of doing it manually. Thus, the reasoner infers all the necessary subsumptions from the class descriptions, building an inferred polyhierarchy, instead of a manually asserted one. However, adequate and precise class descriptions are needed in order for the reasoner to be able to infer the wanted polyhierarchy.

Languages such as OWL provide the necessary expressivity to write class expressions that are rich enough for the reasoner to infer the polyhierarchy: universal restriction (only), existential restriction (some), number restriction (min, max, exactly), boolean operators (or, and, not), etc. Such constructs can be combined to build rich expressions like part_of some (nucleus and (has_function only photosynthesis)) (part of at least one thing that if it has a function it must be photosynthesis). More importantly from the perspective of Normalisation, defined or primitive classes can be created using OWL. A defined class has at least one necessary and sufficient condition (e.g. nucleus equivalentTo has_part some nucleolus): that is, having a nucleolus as part is enough to infer that and organelle is a nucleus (nucleus is the only organelle with nucleolus as part). A primitive class has only necessary conditions (e.g. nucleus subClassOf part_of some cell): that is, all the nuclei are part of a cell, but other organelles are also part of a cell, so if we find an entity that is part of a cell we cannot infer that it is a nucleus.

In order to use OWL’s capabilities, a normalised ontology should be divided in two parts: the primitive axis and the modules. The primitive axis is formed by primitive classes (yellow ovals), pair-wise disjoint, and with only one superclass. The primitive axis has several levels, and contains the bulk of the classes. The modules are classes with no superclasses (apart from owl:Thing or root class), not disjoint, and defined (brown ovals).

When reasoning is performed, the reasoner will infer that each module has several subclasses from the primitive axis, creating a polyhierarchy. The key for such inference is the fact that each class from the primitive axis has several necessary conditions, and each of such conditions is also present in one of the modules. When adding a new class, the maintainer adds conditions to it that, when inference is performed, will lead to the addition of the needed subsumptions by the reasoner, instead of adding such subsumptions manually.

There are ontologies in which Normalisation fits better. For example, the Cell Type Ontology (CL) presents a polyhierarchy where the Normalisation structure can be neatly applied, as the classification of cells according to different criteria (ploidy, function, development stage, lineage, nucleation, etc.) can be codified as modules: i.e., in a Normalised CL (A version of CL built using Normalisation) there would be a module Haploid Cell (equivalentTo has_ploidy some haploid) that would be inferred as superclass of all the haploid cells (primitive classes with the condition subClassOf has_ploidy some haploid; e.g. ovum, spermatozoon, etc.).

Why use Normalisation?

The use of Normalisation has several advantages. The main advantage is the maintenance process; the reasoner infers all the entailed subsumptions, without missing any. That is especially important in big ontologies like the Gene Ontology, as demonstrated in the GONG project, or in ontologies with a high subsumption per class ratio.

In a Normalised ontology, there is a set of agreed object properties, and, when adding a new class, the ontologist need only explore such object properties and add the due restrictions to the new class. The process resembles the description of an object by filling a form. Therefore the modelling is principled as every developer “fills the same form”. The principled modelling allows to split the work between many developers. This modelling process also results in a modular ontology, as to extend the ontology with a new module, it is only necessary to add a new defined class. Defined classes, or modules, can be regarded as different “views” upon the same collection of objects (e.g. cell by function, cell by organism, cell by ploidy, cell by nuclear number, etc.).

To enable the inference of the polyhierarchy by the reasoner, many axioms need to be added. Such a rich axiomisation is beneficial because it makes the modelling explicit; the reasoner and other users know why a class is a subclass of another class, as this relation is the result of both having a common condition, rather than the manual assertion of the subsumption relationship. For example, if we assert that leptomemingeal cell is a secretory cell, other users and most importantly the reasoner do not know why it is a secretory cell (a biologist may deduce the reason from the term names, but term names are completely useless for reasoners). However, if we assert that leptomemingeal cell has_function some ECM_secretion, it is clear why it has been classified as a subclass of secretory cell (which is equivalentTo has_function some secretion, and ECM secretion is a subClassOf secretion).

Having a richer axiomisation allows to execute more complex queries against the ontology. Such axiomisation also makes automatic debugging possible (e.g. by using explanations). Having explicit wrong axioms is preferable to having implicit wrong ideas, as the reasoner will suggest a possible path for a solution.

Conclusion

Reasoning can be used to maintain ontologies in different ways. One of such ways is to use Normalisation, an ontology building technique that enables the automatic maintenance of polyhierarchies. Normalisation requires the addition of precise axioms for the reasoner to infer the correct subsumptions. It could be argued that Normalisation requires the same, or even more, work than the manual maintenance. However, using normalisation, the same amount of work yields several advantages (automatic maintenance and rich axiomisation), and, in the long term, manual maintenance requires more work, e.g. to fix missing subsumptions.

Related topics

Normalisation is an Ontology Design Pattern (ODP), thus a best practice for building efficient and rich bio-ontologies. There are different repositories of ODPs: http://odps.sf.net/, http://ontologydesignpatterns.org.

Some OWL tutorials show how to use Normalisation in practical terms.

]]>
http://ontogenesis.knowledgeblog.org/49/feed 2