Feasibility of using clinical element models (CEM) to standardize phenotype variables in the database of genotypes and phenotypes (dbGaP)


Conference Paper

The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated in Genome Wide Association Studies (GWAS). These data can be used to facilitate novel scientific discovery and to reduce cost and time for exploratory research. However, idiosyncrasies in variable names become a major barrier for reusing these data. We studied the problem of formalizing the phenotype variable descriptions using Clinical Element Models (CEM). Direct mapping of 379 phenotype names to existing CEM yielded a low rate of exact matches (N=25). However, the flexible and expressive underlying information models of CEM provided a robust means of representing 115 phenotype variable descriptions, indicating that CEMs can be successfully applied to standardize a large portion of the clinical variables contained in dbGaP. © 2012 IEEE.

Full Text

Duke Authors

Cited Authors

  • Lin, KW; Tharp, M; Conway, M; Ross, M; Hsieh, A; Kim, HE

Published Date

  • December 1, 2012

Published In

  • Proceedings 2012 Ieee 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, Hisb 2012

Start / End Page

  • 123 -

International Standard Book Number 13 (ISBN-13)

  • 9780769549217

Digital Object Identifier (DOI)

  • 10.1109/HISB.2012.48

Citation Source

  • Scopus