An algorithm for finding substantially broken repeated sequences in newly sequenced genomes
Interspersed repeats occupy a significant fraction of many eukaryotic genomes. They result from the activity and accumulation of transposable elements, sequences which are able to replicate in virtually all organisms and which have been successfully maintained through the evolution. With the increasing availability of higher eukaryotic genomes, the identification and annotation of repeats has become an important task in genome biology and it has provoked a shift from the study of individual elements to their genome-wide distributions. In this paper we present a new method for de novo identification of repetitive segments in a genome, particularly suitable to identify these present in large copy numbers but which have diverged so much that they cannot be recognized by existing techniques, generally relying on relatively high sequence similarity between the copies. © 2008 American Institute of Physics.