Scholars@Duke publication: Similarity between semantic description sets: addressing needs beyond data integration

Similarity between semantic description sets: addressing needs beyond data integration

Publication , Conference

Vision, T; Blake, J; Lapp, H; Mabee, P; Westerfield, M

Published in: Proceedings of the First International Workshop on Linked Science (LISC 2011)

2011

Descriptive information is easy to understand and communicate in natural language. Examples in the biological realm include the cellular functions of proteins and the phenotypes exhibited by organisms. Large latent stores of such descriptive data are stored in databases that can be mined, but even more still reside only in the scientiﬁc literature. Although such information has traditionally been opaque to comput- ers, in recent years signiﬁcant eﬀorts have gone into exposing descrip- tive information to computation through the development of ontologies and associated tools. A host of software applications now employ simple reasoning over Gene Ontology annotated data to help interpret experimental ﬁndings in genomics in terms of protein function. In the domain of biological phenotypes, the combination of entity terms from taxon- speciﬁc anatomy ontologies with quality terms from generic ontologies such as PATO have been used to construct semantically precise and contextualized descriptions. It is natural for multiple semantic descriptions to pertain to single instances in the real world, as in the case of both protein functions and organismal phenotypes. However, applications for ontology-based annotations that go beyond simple knowledge organization, and that exploit sets of semantic descriptions, are puzzlingly rare. In particular, we argue that there is wide applicability, and a sore need, for tools that can satisfy the simple, common use case of identifying statistically improbable similarity between sets of semantic descriptions. Several metrics have been proposed for this task in the literature, but not yet fully evaluated, explored, and adopted. The requirements for semantic similarity tools tailored to sets of semantic descriptions would include speed, scalability to large numbers of sets, demonstrated statistical and biological validity, and ease of use.

Duke Scholars

Author Hilmar Lapp

Published In

Proceedings of the First International Workshop on Linked Science (LISC 2011)

Publication Date

2011

Volume

783

Publisher

CEUR Workshop Proceedings

Citation

APA

Chicago

ICMJE

MLA

NLM

Vision, T., Blake, J., Lapp, H., Mabee, P., & Westerfield, M. (2011). Similarity between semantic description sets: addressing needs beyond data integration. In T. Kauppinen, L. C. Pouchard, & C. Keßler (Eds.), Proceedings of the First International Workshop on Linked Science (LISC 2011) (Vol. 783). CEUR Workshop Proceedings.

Vision, T., Judith Blake, Hilmar Lapp, Paula Mabee, and Monte Westerfield. “Similarity between semantic description sets: addressing needs beyond data integration.” In Proceedings of the First International Workshop on Linked Science (LISC 2011), edited by Tomi Kauppinen, Line C. Pouchard, and Carsten Keßler, Vol. 783. CEUR Workshop Proceedings, 2011.

Vision T, Blake J, Lapp H, Mabee P, Westerfield M. Similarity between semantic description sets: addressing needs beyond data integration. In: Kauppinen T, Pouchard LC, Keßler C, editors. Proceedings of the First International Workshop on Linked Science (LISC 2011). CEUR Workshop Proceedings; 2011.

Vision, T., et al. “Similarity between semantic description sets: addressing needs beyond data integration.” Proceedings of the First International Workshop on Linked Science (LISC 2011), edited by Tomi Kauppinen et al., vol. 783, CEUR Workshop Proceedings, 2011.

Published In

Proceedings of the First International Workshop on Linked Science (LISC 2011)

Publication Date

2011

Volume

783

Publisher

CEUR Workshop Proceedings