PLoS One 2018 Oct 3; 13(10):e0204912
The Cancer Genome Atlas (TCGA) provides a genetic characterization of more than ten thousand tumors, enabling the discovery of novel driver mutations, molecular subtypes, and enticing drug targets across many histologies. Here we investigated why some mutations are common in particular cancer types but absent in others. As an example, we observed that the gene CCDC168 has no mutations in the stomach adenocarcinoma (STAD) cohort despite its common presence in other tumor types. Surprisingly, we found that the lack of called mutations was due to a systematic insufficiency in the number of sequencing reads in the STAD and other cohorts, as opposed to differential driver biology. Using strict filtering criteria, we found similar behavior in four other genes across TCGA cohorts, with each gene exhibiting systematic sequencing depth issues affecting the ability to call mutations. We identified the culprit as the choice of exome capture kit, as kit choice was highly associated with the set of genes that have insufficient reads to call a mutation. Overall, we found that thousands of samples across all cohorts are subject to some capture kit problems. For example, for the 6353 samples using the Broad Institute's Custom capture kit there are undercalling biases for at least 4833 genes. False negative mutation calls at these genes may obscure biological similarities between tumor types and other important cancer driver effects in TCGA datasets.
Wang, Victor G; Kim, Hyunsoo; and Chuang, Jeffrey, "Whole-exome sequencing capture kit biases yield false negative mutation calls in TCGA cohorts." (2018). Faculty Research 2018. 196.