Document Type

Article

Publication Date

10-2-2025

Keywords

JGM, Humans, Software, Neoplasms, Mutation, Computational Biology, Isocitrate Dehydrogenase, Databases, Genetic

ISSN

1367-4811

PMID

41017641

DOI

https://doi.org/10.1093/bioinformatics/btaf546

Grant

This work was supported by the Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231 and National Institutes of Health awards 5R24OD011883-12, 5RM1HG010860-03, 5R01HD103805-04, and 5U24HG011449-04. P.N.R. was sup- ported by the Alexander von Humboldt foundation.

Abstract

SUMMARY: Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 and survival time in brain cancer patients.

AVAILABILITY AND IMPLEMENTATION: Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS