Is it better to combine predictions?
We have compared the accuracy of the individual protein secondary structure prediction methods: PHD, DSC, NNSSP and Predator against the accuracy obtained by combing the predictions of the methods. A range of ways of combing predictions were tested: voting, biased voting, linear discrimination, neural networks and decision trees. The combined methods that involve 'learning' (the non-voting methods) were trained using a set of 496 non-homologous domains; this dataset was biased as some of the secondary structure prediction methods had used them for training. We used two independent test sets to compare predictions: the first consisted of 17 non-homologous domains from CASP3 (Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction); the second set consisted of 405 domains that were selected in the same way as the training set, and were non-homologous to each other and the training set. On both test datasets the most accurate individual method was NNSSP, then PHD, DSC and the least accurate was Predator; however, it was not possible to conclusively show a significant difference between the individual methods. Comparing the accuracy of the single methods with that obtained by combing predictions it was found that it was better to use a combination of predictions. On both test datasets it was possible to obtain a approximately 3% improvement in accuracy by combing predictions. In most cases the combined methods were statistically significantly better (at P = 0.05 on the CASP3 test set, and P = 0.01 on the EBI test set). On the CASP3 test dataset there was no significant difference in accuracy between any of the combined method of prediction: on the EBI test dataset, linear discrimination and neural networks significantly outperformed voting techniques. We conclude that it is better to combine predictions.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Proteins
- Protein Structure, Secondary
- Models, Molecular
- Models, Chemical
- Biophysics
- Algorithms
- 3106 Industrial biotechnology
- 3101 Biochemistry and cell biology
- 10 Technology
- 06 Biological Sciences
Citation
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Proteins
- Protein Structure, Secondary
- Models, Molecular
- Models, Chemical
- Biophysics
- Algorithms
- 3106 Industrial biotechnology
- 3101 Biochemistry and cell biology
- 10 Technology
- 06 Biological Sciences