Skip to main content

Metadata discovery of heterogeneous biomedical datasets using token-based features

Publication ,  Conference
Wen, J; Gouripeddi, R; Facelli, JC
Published in: Lecture Notes in Electrical Engineering
January 1, 2017

Metadata discovery is the process of recognizing semantics and descriptors of data elements and datasets. This study uses a machine-learning approach to classify biomedical dataset characteristics for metadata discovery. Four common types of biomedical data sources were included in this study - genetic variant, protein structure, scientific publications, and general English corpus. Decision tree classification models were built using token-based features derived from these data files. These decision tree classification models are able to identify the four data sources with average F1 scores ranging from 0.935 to 1.000. This study demonstrates that biomedical data of different types have different distributions of token-based document structural features and that such structural features can be leveraged for metadata discovery.

Duke Scholars

Published In

Lecture Notes in Electrical Engineering

DOI

EISSN

1876-1119

ISSN

1876-1100

Publication Date

January 1, 2017

Volume

449

Start / End Page

60 / 67
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wen, J., Gouripeddi, R., & Facelli, J. C. (2017). Metadata discovery of heterogeneous biomedical datasets using token-based features. In Lecture Notes in Electrical Engineering (Vol. 449, pp. 60–67). https://doi.org/10.1007/978-981-10-6451-7_8
Wen, J., R. Gouripeddi, and J. C. Facelli. “Metadata discovery of heterogeneous biomedical datasets using token-based features.” In Lecture Notes in Electrical Engineering, 449:60–67, 2017. https://doi.org/10.1007/978-981-10-6451-7_8.
Wen J, Gouripeddi R, Facelli JC. Metadata discovery of heterogeneous biomedical datasets using token-based features. In: Lecture Notes in Electrical Engineering. 2017. p. 60–7.
Wen, J., et al. “Metadata discovery of heterogeneous biomedical datasets using token-based features.” Lecture Notes in Electrical Engineering, vol. 449, 2017, pp. 60–67. Scopus, doi:10.1007/978-981-10-6451-7_8.
Wen J, Gouripeddi R, Facelli JC. Metadata discovery of heterogeneous biomedical datasets using token-based features. Lecture Notes in Electrical Engineering. 2017. p. 60–67.

Published In

Lecture Notes in Electrical Engineering

DOI

EISSN

1876-1119

ISSN

1876-1100

Publication Date

January 1, 2017

Volume

449

Start / End Page

60 / 67