OrthoCluster: A new tool for mining synteny blocks and applications in comparative genomics
By comparing genomes among both closely and distally related species, comparative genomics analysis characterizes structures and functions of different genomes in both conserved and divergent regions. Synteny blocks, which are conserved blocks of genes on chromosomes of related species, play important roles in comparative genomics analysis. Although a few tools have been designed to identify synteny blocks, most of them cannot handle some challenging application requirements, particularly the strandedness of genes, gene inversions, gene duplications, and comparison of more than two genomes. We developed a data mining tool, OrthoCluster, which can handle all those challenges. It is publicly available at http://genome.sfu.ca/projects/ orthocluster. OrthoCluster takes the annotated gene sets of candidate genomes and pairwise orthologous relationships as input and efficiently identifies the complete set of synteny blocks. In addition, OrthoCluster identifies four types of genome rearrangement events namely inversion, transposition, insertion/deletion, and reciprocal translocation. To be flexible in various application scenarios, OrthoCluster comes with a systematic set of parameters such as the synteny block size, number of mismatches allowed, whether the strandedness is enforced, whether gene ordering is preserved. Furthermore, OrthoCluster can be used to identify segmental duplication in a genome. In this paper, we introduce the major technical ideas, and present some interesting findings using OrthoCluster. Copyright 2008 ACM.