Beyond the data deluge: data integration and bio-ontologies.

Document Type


Publication Date



Biology, Computational-Biology, Data-Interpretation-Statistical, Databases-Factual, Genome, Genomics, Humans, Medical-Informatics, Mice, Research-Support-N, I, H, -Extramural, Research-Support-Non-U, S, -Gov't, Terminology

First Page


Last Page


JAX Source

J Biomed Inform 2006 Jun; 39(3):314-20.


Biomedical research is increasingly a data-driven science. New technologies support the generation of genome-scale data sets of sequences, sequence variants, transcripts, and proteins; genetic elements underpinning understanding of biomedicine and disease. Information systems designed to manage these data, and the functional insights (biological knowledge) that come from the analysis of these data, are critical to mining large, heterogeneous data sets for new biologically relevant patterns, to generating hypotheses for experimental validation, and ultimately, to building models of how biological systems work. Bio-ontologies have an essential role in supporting two key approaches to effective interpretation of genome-scale data sets: data integration and comparative genomics. To date, bio-ontologies such as the Gene Ontology have been used primarily in community genome databases as structured controlled terminologies and as data aggregators. In this paper we use the Gene Ontology (GO) and the Mouse Genome Informatics (MGI) database as use cases to illustrate the impact of bio-ontologies on data integration and for comparative genomics. Despite the profound impact ontologies are having on the digital categorization of biological knowledge, new biomedical research and the expanding and changing nature of biological information have limited the development of bio-ontologies to support dynamic reasoning for knowledge discovery.