Biological databases in the age of generative artificial intelligence.

Mihai Pop
Teresa K Attwood
Judith A. Blake, The Jackson LaboratoryFollow
Philip E Bourne
Ana Conesa
Terry Gaasterland
Lawrence Hunter
Carl Kingsford
Oliver Kohlbacher
Thomas Lengauer
Scott Markel
Yves Moreau
William S Noble
Christine Orengo
B F Francis Ouellette
Laxmi Parida
Natasa Przulj
Teresa M Przytycka
Shoba Ranganathan
Russell Schwartz
Alfonso Valencia
Tandy Warnow

Document Type

Article

Publication Date

1-1-2025

Original Citation

Pop M, Attwood T, Blake JA, Bourne P, Conesa A, Gaasterland T, Hunter L, Kingsford C, Kohlbacher O, Lengauer T, Markel S, Moreau Y, Noble W, Orengo C, Ouellette B, Parida L, Przulj N, Przytycka T, Ranganathan S, Schwartz R, Valencia A, Warnow T. Biological databases in the age of generative artificial intelligence. Bioinform Adv. 2025;5(1):vbaf044.

Keywords

JMG

JAX Source

Bioinform Adv. 2025;5(1):vbaf044.

ISSN

2635-0041

PMID

40177265

DOI

https://doi.org/10.1093/bioadv/vbaf044

Abstract

SUMMARY: Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.

AVAILABILITY AND IMPLEMENTATION: Not applicable.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Faculty Research 2025

Biological databases in the age of generative artificial intelligence.

Document Type

Publication Date

Original Citation

Keywords

JAX Source

ISSN

PMID

DOI

Abstract

Creative Commons License

Search

Browse

Links

Faculty Research 2025

Biological databases in the age of generative artificial intelligence.

Authors

Document Type

Publication Date

Original Citation

Keywords

JAX Source

ISSN

PMID

DOI

Abstract

Creative Commons License

Share

Search

Browse

Links