BiologicalNetworks


Database

Data Integration

  1. Source Definition.
  2. Schema Mapping.
  3. Sample Schema Mappings.

While the logical model of information in BiologicalNetworks is graph based, there are in reality a variety of informa- tion sources that provide different components of the integrated graph. A source may contribute only node properties. For example, the Yeast GFP Fusion Localization Database (Huh et al. (2003), Ghaemmaghami et al. (2003)) provides, for each gene or ORF, a set of locations where the gene is expressed. The location is considered a multi-valued property of the gene. A second kind of source contributes graphs. All protein-protein interaction sources like Gavin et al. (2002), for example, may con- tribute an interaction graph from co-immunoprecipitation experiments using a matrix model. In this case, the type of the edge is physical interaction and the label on the edge includes the type of experiment (co-IP). Yet a third category of sources, may explicitly provide a set of attributes qualifying the nature of the relationship. The protein-DNA interaction data provided by Lee et al. (2002) records a probability value for every interaction between a protein and a DNA region, both for genes and for intergenic regions. Note that this graph is directed while the protein-protein interaction graphs are not. Given this heterogeneity in the type of information that a source may provide, BiologicalNetworks uses a very generic internal model to accommodate different kinds of sources, such that a new source, providing a new set of nodes, edges, or node/edge properties can be dynamically incorporated into an existing integrated database.

Source Definition.

Currently, the external data source can be:
  • a relational database schema
  • a tree-structured XML document
  • an RDF-styled triplet that describes an edge set of a graph
  • a DAG structured OWL (http://www.w3.org/TR/owl-features) document.

Typically, a new ontology or a node/attribute type hierarchy, such as the phenotype classification tree from MIPS:

Schema Mapping.

Schema mapping specifies how an element of the imported source should be interpreted as an element of the internal schema of the database. We considering the tree-structured and relational data sources. In the first case, we would like the OWL schema to populate the node type hierarchy. The mapping declarations are:

IMPORT  NODE  TYPE  FROM yeast phenotype ( Class as name,
) GRAPH phenotype tree
IMPORT  RELATIONSHIP  FROM yeast phenotype(
subClassOf as child of
) GRAPH phenotype tree

The second example of source integration imports a relational schema (a fragment of the MIPS database) into the graph elements of the internal model. Figure 2 shows the relational schema. In the MIPS complex relation, the attribute complex references the cid of the MIPS complex category relation. Notice that gene name is explicitly mapped to the pre-existing attribute name. The expression (orf1, gene1) AS SOURCE REFERENCE states that for the edge the source node uses the attribute pair {orf1, gene1} as a foreign key. Also notice how the protein complexes and their members are defined explicitly as hypernodes and hypernode members.

Sample Source Schema. MIPS

Mapping directives for MIPS:

Sample Source Schema : Gene Ontology

Mapping directives for GO: