Document Type
Article
Publication Date
7-20-2023
Original Citation
Chiliński M,
Lipiński J,
Agarwal A,
Ruan Y,
Plewczynski D.
Enhanced performance of gene expression predictive models with protein-mediated spatial chromatin interactions. Sci Rep. 2023;13(1):11693
Keywords
JGM, Chromatin, CCCTC-Binding Factor, Chromosomes, Cell Nucleus, Cell Cycle Proteins, Gene Expression
JAX Source
Sci Rep. 2023;13(1):11693
ISSN
2045-2322
PMID
37474564
DOI
https://doi.org/10.1038/s41598-023-38865-5
Grant
This work has been supported by National Science Centre, Poland (2019/35/O/ST6/02484 and 2020/37/B/ NZ2/03757); The work has been co-supported by Enhpathy—“Molecular Basis of Human enhanceropathies” funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska- Curie grant agreement No 860002 and National Institute of Health USA 4DNucleome grant 1U54DK107967-01 "Nucleome Positioning System for Spatiotemporal Genome Organization and Regulation". Research was co- funded by the Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme. Computations were performed thanks to the Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, using the Arti- ficial Intelligence HPC platform financed by the Polish Ministry of Science and Higher Education (decision no. 7054/IA/SP/2020 of 2020-08-28).
Abstract
There have been multiple attempts to predict the expression of the genes based on the sequence, epigenetics, and various other factors. To improve those predictions, we have decided to investigate adding protein-specific 3D interactions that play a significant role in the condensation of the chromatin structure in the cell nucleus. To achieve this, we have used the architecture of one of the state-of-the-art algorithms, ExPecto, and investigated the changes in the model metrics upon adding the spatially relevant data. We have used ChIA-PET interactions that are mediated by cohesin (24 cell lines), CTCF (4 cell lines), and RNAPOL2 (4 cell lines). As the output of the study, we have developed the Spatial Gene Expression (SpEx) algorithm that shows statistically significant improvements in most cell lines. We have compared ourselves to the baseline ExPecto model, which obtained a 0.82 Spearman's rank correlation coefficient (SCC) score, and 0.85, which is reported by newer Enformer were able to obtain the average correlation score of 0.83. However, in some cases (e.g. RNAPOL2 on GM12878), our improvement reached 0.04, and in some cases (e.g. RNAPOL2 on H1), we reached an SCC of 0.86.
Comments
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.