Document Type


Publication Date


JAX Source

EBioMedicine. 2023;94:104726.







J.H.C. acknowledges support from NCI grant R01CA230031 and P30CA034196. A.F. acknowledges support from a JAX Scholar award, Farmington, CT, USA.


BACKGROUND: Colorectal cancers are the fourth most diagnosed cancer and the second leading cancer in number of deaths. Many clinical variables, pathological features, and genomic signatures are associated with patient risk, but reliable patient stratification in the clinic remains a challenging task. Here we assess how image, clinical, and genomic features can be combined to predict risk.

METHODS: We developed and evaluated integrative deep learning models combining formalin-fixed, paraffin-embedded (FFPE) whole slide images (WSIs), clinical variables, and mutation signatures to stratify colon adenocarcinoma (COAD) patients based on their risk of mortality. Our models were trained using a dataset of 108 patients from The Cancer Genome Atlas (TCGA), and were externally validated on newly generated dataset from Wayne State University (WSU) of 123 COAD patients and rectal adenocarcinoma (READ) patients in TCGA (N = 52).

FINDINGS: We first observe that deep learning models trained on FFPE WSIs of TCGA-COAD separate high-risk (OS < 3 years, N = 38) and low-risk (OS > 5 years, N = 25) patients (AUC = 0.81 ± 0.08, 5 year survival p < 0.0001, 5 year relative risk = 1.83 ± 0.04) though such models are less effective at predicting overall survival (OS) for moderate-risk (3 years < OS < 5 years, N = 45) patients (5 year survival p-value = 0.5, 5 year relative risk = 1.05 ± 0.09). We find that our integrative models combining WSIs, clinical variables, and mutation signatures can improve patient stratification for moderate-risk patients (5 year survival p < 0.0001, 5 year relative risk = 1.87 ± 0.07). Our integrative model combining image and clinical variables is also effective on an independent pathology dataset (WSU-COAD, N = 123) generated by our team (5 year survival p < 0.0001, 5 year relative risk = 1.52 ± 0.08), and the TCGA-READ data (5 year survival p < 0.0001, 5 year relative risk = 1.18 ± 0.17). Our multicenter integrative image and clinical model trained on combined TCGA-COAD and WSU-COAD is effective in predicting risk on TCGA-READ (5 year survival p < 0.0001, 5 year relative risk = 1.82 ± 0.13) data. Pathologist review of image-based heatmaps suggests that nuclear size pleomorphism, intense cellularity, and abnormal structures are associated with high-risk, while low-risk regions have more regular and small cells. Quantitative analysis shows high cellularity, high ratios of tumor cells, large tumor nuclei, and low immune infiltration are indicators of high-risk tiles.

INTERPRETATION: The improved stratification of colorectal cancer patients from our computational methods can be beneficial for treatment plans and enrollment of patients in clinical trials.

FUNDING: This study was supported by the National Cancer Institutes (Grant No. R01CA230031 and P30CA034196). The funders had no roles in study design, data collection and analysis or preparation of the manuscript.


This is an open access article under the CC BY-NC-ND license (