Document Type
Article
Publication Date
10-2-2025
Original Citation
Sierk M,
Danis D,
Patil S,
Kishor N,
Mondal R,
Jha A,
Chen Q,
Yan C,
Munoz-Torres M,
Meerzaman D,
Robinson P,
Reese J.
Oncopacket: integration of cancer research data using GA4GH phenopackets. Bioinformatics (Oxford, England). 2025; 41(10):
Keywords
JGM, Humans, Software, Neoplasms, Mutation, Computational Biology, Isocitrate Dehydrogenase, Databases, Genetic
ISSN
1367-4811
PMID
41017641
DOI
https://doi.org/10.1093/bioinformatics/btaf546
Grant
This work was supported by the Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231 and National Institutes of Health awards 5R24OD011883-12, 5RM1HG010860-03, 5R01HD103805-04, and 5U24HG011449-04. P.N.R. was sup- ported by the Alexander von Humboldt foundation.
Abstract
SUMMARY: Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 and survival time in brain cancer patients.
AVAILABILITY AND IMPLEMENTATION: Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.