Abstract

Motivation: The rapid accumulation of biological network data has led to an urgent need in computational methods for graph pattern mining. One important problem is to iden-tify recurrent patterns across multiple networks to discover biological modules. However, existing algorithms for fre-quent pattern mining become very costly in time and space as the pattern sizes and network numbers increase. Cur-rently, no efficient algorithm is available for mining recur-rent patterns across large collections of genome-wide net-works.

Results: We developed a novel algorithm, CODENSE, to efficiently mine frequent coherent dense subgraphs across large numbers of massive graphs. Compared to previous methods, our approach is scalable in the number and size of the input graphs, flexible in mining either weighted or unweighted graphs, and adjustable in terms of exact or ap-proximate pattern mining. Applying CODENSE to 39 co-expression networks derived from microarray data sets, we demonstrated that our software discovered a large number of functionally homogenous clusters, and made functional predictions for 83 uncharacterized yeast genes.