Struct2fun | Translational Genomics & Bioinformatics Lab

Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

Chao Dai, Wenyuan Li, Harianto Tjong, Shengli Hao, Yonggang Zhou, Qingjiao Li, Lin Chen, Bing Zhu, Frank Alber and Xianghong Jasmine Zhou

Abstract

Summary: Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as "Regulatory Communities". We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures.

Supplemental material

Download supplementary material

Downloads

We provide the efficient implementations of the algorithmic steps (i.e., Steps 3 and 4). Both source codes and README files can be downloaded below.

Step 3: Tensor-based frequent dense subgraph identification algorithm. This software discovers frequent dense subgraphs from multiple unweighted biological networks. By modeling multiple networks as a tensor, we formulate the problem of discovering frequent patterns as an optimization problem with sparse constraint and employ the multi-stage convex relaxation method. It can find frequent patterns across a large collection of massive unweighted networks.

Tensor-based frequent dense subgraph algorithm manual

Download source code

Step 4: A counting algorithm for final pattern recovery. This software recovers a frequent dense subgraph/module in the original chromatin interaction graphs from the contracted subgraphs obtained in Step 3.

Counting algorithm manual

Download source code

Authors

Chao Dai¹
Wenyuan Li¹
Harianto Tjong²
Shengli Hao¹
Yonggang Zhou¹
Qingjiao Li¹
Lin Chen¹
Bing Zhu²
Frank Alber¹
Xianghong Jasmine Zhou¹

¹Program in Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA

²Institute of Biophysics, Chinese Academy of Sciences, Beijing, China

Contact

Xianghong Jasmine Zhou

Professor
Pathology and Laboratory Medicine
University of California at Los Angeles

Office: Boyer Hall, Suite 520
Phone: 310-267-0363
Email: xjzhou@mednet.ucla.edu

Frank Alber

Professor
Program in Molecular and Computational Biology
University of Southern California

Phone: 213-740-0778
Fax: 213-740-2437
Email: alber@usc.edu