Faculty Research 2023

Predictive models of long COVID.

Blessy Antony
Hannah Blau, The Jackson LaboratoryFollow
Elena Casiraghi
Johanna J Loomba
Tiffany J Callahan
Bryan J Laraway
Kenneth J Wilkins
Corneliu C Antonescu
Giorgio Valentini
Andrew E Williams
Peter N Robinson, The Jackson LaboratoryFollow
Justin T Reese
T M Murali
on behalf of the N3C consortium.

Document Type

Article

Publication Date

9-4-2023

Original Citation

Antony B, Blau H, Casiraghi E, Loomba J, Callahan T, Laraway B, Wilkins K, Antonescu C, Valentini G, Williams A, Robinson P, Reese J, Murali T, . Predictive models of long COVID. EBioMedicine. 2023;96:104777

Keywords

JGM

JAX Source

EBioMedicine. 2023;96:104777

ISSN

2352-3964

PMID

37672869

DOI

https://doi.org/10.1016/j.ebiom.2023.104777

Grant

NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/ NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

Abstract

BACKGROUND: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future.

METHODS: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741).

FINDINGS: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75.

INTERPRETATION: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology.

FUNDING: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

Comments

Download

COinS

Faculty Research 2023

Predictive models of long COVID.

Document Type

Publication Date

Original Citation

Keywords

JAX Source

ISSN

PMID

DOI

Grant

Abstract

Comments

Search

Browse

Links

Faculty Research 2023

Predictive models of long COVID.

Authors

Document Type

Publication Date

Original Citation

Keywords

JAX Source

ISSN

PMID

DOI

Grant

Abstract

Comments

Share

Search

Browse

Links