on January 22, 2010 by dosumis in Under Review, Comments (0)

OBO Format

David Osumi-Sutherland
FlyBase, University of Cambridge, Downing Street, Cambridge CB2, 3EH, UK

Abstract

OBO format is a popular ontology file format. It can express a subset of the description logic language OWL-DL 2.0 but in addition has standard syntax for representing important classes of meta-data including as synonyms, references to publications and deprecated IDs. It is designed to be human readable and editable, easy to parse, easy to extend and to have minimal redundancy. Here, I provide and overview of OBO 1.2 syntax and semantics, focusing on its relation to OWL and aspects of its design relevant to how it is commonly used.

Disclaimer: This article is meant as an informal guide to OBO format 1.2. It is not meant as a definitive guide to OBO syntax or semantics. For this, please see the official OBO format specification and Golbreich and Horrocks’ BNF specification of OBO 1.2 syntax. For obvious reasons, the guide draws heavily on these sources.

Syntax – Anatomy of an OBO file.

Each OBO file consists of a header, followed by a series of stanzas. Stanzas come in three flavors – Term, Typedef or Instance – indicated in square brackets in the first line of the stanza (see example below). Header and stanzas are populated by fields, one per line (although multi-line values are possible with the use of escape characters), that take the form <tag>: <value>

e.g.-
    [Term]
    id: blah:1234567
    name: skull
    def: "Part of the part of skeleton containing the brain."

Values themselves may contain multiple subvalues. For example, the synonym and definition fields (must?) end with a database xref subvalue: a comma delimited list of 0-many database cross references (typically identifiers for literature sources for the definition or synonym)

e.g. -
    name: antennal basiconic sensillum TB
    def: "Olfactory basiconic sensillum of antennal segment 3 with longitudinal rows of pores." [FlyBase:FBrf0128642, PMID:18535862]

The header must contain a format-version tag. All other tags are optional.

All stanzas must contain an id tag, and either a name tag or the tag ‘is_anonymous’. All other tag-value pairs are optional.

The value of the id tag declares the object (typedef, term, instance) to which the rest of the tags in the stanza refer. A file, or a collection of files intended to be loaded together, may contain multiple stanzas that describe different aspects of a single object. A required tag must be specified at least once for each object in a given set of files. This makes it possible for optional information to be stored in a separate file and only loaded when necessary.

IDs and their scope

ID syntax

The OBO syntax specification does not specify an ID format, although by tradition, the following ID syntax is used: <ID-Space>:<Local-ID>

Where <ID-Space\> is a letter string and <Local-ID> is a number string of fixed length – low numbers are preceded by zeros to achieve the appropriate length.

In the context of the OBO foundry, the combination of namespace, ID-Space and Local-ID should give a unique identifier with the foundry. Namespace is specifed using ‘namespace’ tag in a stanza, or, if that is absent, by a ‘default-namespace’ tag in the header. A standard for mapping of these IDs to URIs can be found here: [waiting for link]

Scope

Non-anonymous ids have global scope. An object has the same id in every file, and in every namespace. However, the id of an anonymous object is not fixed; if the ontology is parsed and then reserialized, the id may change. Anonymous ids have local scope; they are only valid in the file from which they were loaded. The same anonymous id in two different files refers to a different object in each file.

IDs and annotation

OBO terms are extensively used for annotation of data-types including wild-type gene function (gene ontology), gene expression and phenotype (various anatomy ontologies). Databases storing these annotations can contain hundreds of thousands of individual annotations. In order that these databases be able to update their ontology versions without

The ID tracking capabilities of OBO provide an effective system for this. In a well maintained OBO ontology, IDs must never be lost. When two terms are merged, OBO_format allows the id of one to be retained using an alt_id tag. Terms may be obsoleted rather than destroyed, allowing ID management software to avoid re-using IDs. When terms are obsoleted, replacements can be suggested using the tags ‘consider’ and ‘replaced_by’, with addition guidance added using the ‘comment’ tag.

Semantics – the meaning of an OBO file

Following the terminology of Smith et al, 2005, term stanzas refer to types (classes in OWL), Typedefs stanzas refer to type-level relations. Type level relations are defined using instance level relations (OWL properties). These definitions are generally stored in a separate file (e.g.- ro.obo). Instance stanzas refer to instances (individuals in OWL terminology.) Instance stanzas are rarely used in OBO ontologies and support for them (through OBO-Edit) is limited.

Type level relations

Unlike OWL DL, where instance level relations (properties) are used directly to relate classes (types) using explicitly stated quantification patterns, OBO uses pre-defined type-level relations. The definition of these type-level relations typically follows the pattern:

C part_of C1 = [definition] for all c, t, if Cct then there is some c1 such that C1c1t and c part_of c1 at t. (Smith et al., 2005; instance level relations are in bold, type level in italic. Instances are referred to by lower cases letters, types by UPPER CASE. t=time).

This specifies that in order for the type level relation to apply, all instances of C must be part_of some instance of C1 at all times. As you can see from this example, it is common to use punning to relate instance and type level relations.

Ignoring the specification of time, this is equivalent to the OWL-DL statement: C subclass_of part_of some C1

Type level relations can be declared to to be transitive, symmetric, asymmetric, reflexive, irreflexive using tags with these names whose value must be either ‘true’ or ‘false’. It should be noted that these apply to the type level relation. If a symmetric instance level relation is used to define a type level relation using the pattern shown above, the resulting type level relation is NOT symmetric.

Like OWL, relations may have sub-relations and domain and range constraints, recorded using is_a, domain and range tags respectively.

Examples:
    name: releases_neurotransmitter
    domain: FBbt_root:00000000 ! anatomical entity
    range: CHEBI:24431 ! molecular structure

    name: develops_directly_from
    is_a: develops_from ! develops_from

   name: develops_from
   is_transitive: true

Logically defining classes

Subclasses are expressed with the is_a relation

name: thoracic segment
is_a: ID:1234567 segment

In OWL: thoracic segment' SubClassOf 'segment'*

*OWL examples are expressed in Manchester syntax [link] and by necessity omit the time component of the OBO definition.

Necessary conditions for class membership are specified using the tag ‘relationship’ with value relation

e.g.-

   name: prothoracic segment
   relationship: part_of thorax

In OWL: 'prothoracic segment' SubClassOf (part_of some 'thorax').

Necessary and sufficient conditions for class membership are specified using the intesection_of tag:

 name: prothoracic leg
 intersection_of: leg
 intersection_of: part_of prothoracic segment`.  

 In OWL: `'prothoracic segment' EquivalentTo (leg that part_of some 'prothoracic segment')`

Classes can be declared to be disjoint using the tag disjoint_from

name: continuant
disjoint_from: occurrent

OWL: 'continuant' disjoint_with 'occurrent'

Meta-data

One of the advantages of OBO format over OWL is the standard systems it provides for recording particular types of meta-data. In particular, it provides a standard syntax for associating references with free text definitions and synonyms and for recording synonyms with a variety of types (e.g.- plural) and scopes (e.g.- broad, narrow)

e.g.-
    name: abdominal lateral bipolar neuron lbd
    namespace: fly_anatomy.ontology
    def: "A bipolar sensory neuron of the lateral complex of larval abdominal segments that emits two long dendritic branches along muscle 3 (Williams and Shepherd, 1999)." [FlyBase:FBrf0089570, FlyBase:FBrf0108300, http://www.normalesup.org/~vorgogoz/FlyPNS/PNSdescription.html#lbd]
    synonym: "intersegmental bipolar neuron" EXACT []
    synonym: "isbp" EXACT [FlyBase:FBrf008957

Acknowledgements

This paper is an open access work distributed under the terms of the Creative Commons Attribution License 2.5 (http://creativecommons.org/licenses/by/2.5/), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are attributed.

The paper and its publication environment form part of the work of the Ontogenesis Network, supported by EPSRC grant EP/E021352/1.

Tags: , ,

No Comments

Leave a comment

Login