SGP-1: prediction and validation of homologous genes based on sequence alignments.

Journal Article (Journal Article)

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of depends little on species-specific properties such as codon usage or the nucleotide distribution. may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at The source code, written in ANSI C, is available on request from the authors.

Full Text

Duke Authors

Cited Authors

  • Wiehe, T; Gebauer-Jung, S; Mitchell-Olds, T; Guigó, R

Published Date

  • September 2001

Published In

Volume / Issue

  • 11 / 9

Start / End Page

  • 1574 - 1583

PubMed ID

  • 11544202

Pubmed Central ID

  • PMC311140

Electronic International Standard Serial Number (EISSN)

  • 1549-5469

International Standard Serial Number (ISSN)

  • 1088-9051

Digital Object Identifier (DOI)

  • 10.1101/gr.177401


  • eng