Machine Learning to Detect Cervical Spine Fractures Missed by Radiologists on CT: Analysis Using Seven Award-Winning Models from the RSNA 2022 Cervical Spine Fracture AI Challenge.
Document Type
Article
Publication Date
1-8-2025
Original Citation
Chen Y,
Hu Z,
Shek K,
Wilson J,
Alotaibi F,
Witiw C,
Lin H,
Ball R,
Patel M,
Mathur S,
Sejdić E,
Colak E.
Machine Learning to Detect Cervical Spine Fractures Missed by Radiologists on CT: Analysis Using Seven Award-Winning Models from the RSNA 2022 Cervical Spine Fracture AI Challenge. AJR Am J Roentgenol. 2025;224(3):e2432076.
Keywords
JMG
JAX Source
AJR Am J Roentgenol. 2025;224(3):e2432076.
ISSN
1546-3141
PMID
39772578
DOI
https://doi.org/10.2214/ajr.24.32076
Abstract
BACKGROUND. Available data on radiologists’ missed cervical spine fractures are
based primarily on studies using human reviewers to identify errors on reevaluation;
such studies do not capture the full extent of missed fractures.
OBJECTIVE. The purpose of this study was to use machine learning (ML) models to
identify cervical spine fractures on CT missed by interpreting radiologists, characterize
the nature of these fractures, and assess their clinical significance.
METHODS. This retrospective study included all cervical spine CT examinations
performed in adult patients in the emergency department between January 1, 2018,
and December 31, 2022. Examinations reported as negative for cervical spine fracture
were processed by seven award-winning ML models from the 2022 Radiological Soci-
ety of North America Cervical Spine Fracture AI Challenge; examinations classified as
positive by at least four of the seven models were considered to have ML-detected frac-
tures. Two neuroradiologists independently reviewed examinations with ML-detected
fractures using ML-derived heat maps to identify those representing true missed frac-
tures. The neuroradiologists further assessed the fractures’ extent. Two spine surgeons
independently assessed whether missed fractures were clinically significant (i.e., war-
ranting at least one of surgical consultation, MRI, CTA, or collar immobilization).
RESULTS. The study included 6671 patients (2414 women, 4257 men; mean age,
54.6 ± 22.1 [SD] years) who underwent a total of 6979 cervical spine CT examina-
tions. Interpreting radiologists reported 6378 examinations as negative for fracture.
Of these, 356 had ML-detected fractures (i.e., positive by at least four of seven mod-
els). The neuroradiologists classified 40 of these examinations, in 39 unique patients,
as having true fractures. ML-detected missed true fractures involved 51 unique sites,
most commonly the C7 transverse process (n = 12), C5 spinous process (n = 12), and C6
spinous process (n = 8). The surgeons considered missed fractures clinically significant
in 15 of 40 examinations (MRI and collar immobilization [n = 7], MRI and surgical eval-
uation [n = 1], CTA [n = 9]). Interobserver agreement, expressed as kappa, was 0.88 be-
tween neuroradiologists for true fracture classification and 0.94 between surgeons for
clinical significance classification.
CONCLUSION. ML models identified cervical spine fractures missed by radiolo-
gists. These fractures were further characterized to systematically highlight radiolo-
gists’ common misses.
CLINICAL IMPACT. This ML-based framework can be applied in quality improve-
ment efforts, to help refine radiologists’ search patterns based on prone-to-miss findings.