Transfer learning of a multitask deep neural network to predict enhancer activity of regulatory variants in the HCT116 cell-line
In: Student Reports, Summer 2022, The Jackson Laboratory
Rodrigo Castro and Ryan Tewhey, Ph.D.
Massively parallel reporter assay (MPRA) is a high-throughput experiment system that provides functional expression data for cis-regulatory elements (CRE) and their variants. These CREs are drivers of human complex traits and disease, and comprehension and prediction ability of CREs would bring patients closer to the dream of genomic medicine. In this work we expand upon MPRA-Cerberus, a sequence-to-activity multi-task convolutional deep neural network (CNN) that predicts MPRA activity of CREs with outstanding accuracy. We demonstrate that MPRA- Cerberus is adaptable to the addition of a new prediction task, namely expression quantitative trait loci (eQTL) fine-mapped in the Genotype-Tissue Expression (GTEx) library tested with MPRA in the human colorectal cancer cell-line HCT116, with high accuracy and model capture of transcription factor (TF) motif grammar. Reaching test set prediction correlation of ���� = 0.86 in HCT116, we utilize the model to generate de novo cell-line specific enhancers and sequence contribution scores to reveal motif insights, including AP-1/ZEB1 interactions, that were learned in the training process. We also evaluate the feasibility of future branch additions, presenting a comprehensive study on the use of MPRA-Cerberus in additional cell-lines to investigate regulatory polymorphism.
Liu, Frank, "Transfer learning of a multitask deep neural network to predict enhancer activity of regulatory variants in the HCT116 cell-line" (2022). Summer and Academic Year Student Reports. 2702.