Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) subcompartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by insitu Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training
Bibliographical noteKAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Drs. Sara Cassidy, Carmen Robinett, and Stephen Sampson from The Jackson Laboratory Research Program Development for editing this paper. We thank The Jackson Laboratory Computational Sciences and Research IT team for technical support
and discussion. S.L. was supported by the Leukemia Research Foundation New Investigator Grant, The Jackson Laboratory Director’s Innovation Fund (JAX-DIF 19000-17- 13), The Jackson Laboratory Cancer Center New Investigator Award, and the National
Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133562. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number
P30CA034196. Y.R. was supported by NIH ENCODE (UM1 HG009409), 4D Nucleome (U54 DK107967), and JAX Director’s Innovation Fund (JAX-DIF 19000-18-02).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.