Skip to main content
Journal cover image

Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.

Publication ,  Journal Article
Struyf, J; Dobrin, S; Page, D
Published in: BMC Genomics
November 7, 2008

BACKGROUND: This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data. RESULTS: We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively. CONCLUSION: This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods.

Duke Scholars

Published In

BMC Genomics

DOI

EISSN

1471-2164

Publication Date

November 7, 2008

Volume

9

Start / End Page

531

Location

England

Related Subject Headings

  • Schizophrenia
  • Retrospective Studies
  • ROC Curve
  • Models, Statistical
  • Models, Biological
  • Middle Aged
  • Male
  • Humans
  • Gene Expression Profiling
  • Gene Expression
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Struyf, J., Dobrin, S., & Page, D. (2008). Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia. BMC Genomics, 9, 531. https://doi.org/10.1186/1471-2164-9-531
Struyf, Jan, Seth Dobrin, and David Page. “Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.BMC Genomics 9 (November 7, 2008): 531. https://doi.org/10.1186/1471-2164-9-531.
Struyf, Jan, et al. “Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.BMC Genomics, vol. 9, Nov. 2008, p. 531. Pubmed, doi:10.1186/1471-2164-9-531.
Journal cover image

Published In

BMC Genomics

DOI

EISSN

1471-2164

Publication Date

November 7, 2008

Volume

9

Start / End Page

531

Location

England

Related Subject Headings

  • Schizophrenia
  • Retrospective Studies
  • ROC Curve
  • Models, Statistical
  • Models, Biological
  • Middle Aged
  • Male
  • Humans
  • Gene Expression Profiling
  • Gene Expression