Predicting cancer susceptibility from single-nucleotide polymorphism data: A case study in multiple myeloma
This paper asks whether susceptibility to early-onset (diagnosis before age '10) of a particularly deadly form of cancer, Multiple Myeloma, can be predicted from single-nucleotide polymorphism (SNP) profiles with an accuracy greater than chance. Specifically, given SNP profiles for 80 Multiple My-eloma patients - of which we believe 40 to have high susceptibility and 40 to have lower susceptibility - we train a support vector machine (SVM) to predict age at diagnosis. We chose SVMs for this task because they arc well suited to deal with interactions among features and redundant features. The accuracy of the trained SVM estimated by leavc- onc-out cross-validation is 71%, significantly greater than random guessing. This result is particularly encouraging since only 3000 SNPs were used in profiling, whereas several million SNPs arc known. © 2005 ACM.