Similarity between semantic description sets: addressing needs beyond data integration

Conference Paper

Descriptive information is easy to understand and communicate in natural language. Examples in the biological realm include the cellular functions of proteins and the phenotypes exhibited by organisms. Large latent stores of such descriptive data are stored in databases that can be mined, but even more still reside only in the scientific literature. Although such information has traditionally been opaque to comput- ers, in recent years significant efforts have gone into exposing descrip- tive information to computation through the development of ontologies and associated tools. A host of software applications now employ simple reasoning over Gene Ontology annotated data to help interpret experimental findings in genomics in terms of protein function. In the domain of biological phenotypes, the combination of entity terms from taxon- specific anatomy ontologies with quality terms from generic ontologies such as PATO have been used to construct semantically precise and contextualized descriptions. It is natural for multiple semantic descriptions to pertain to single instances in the real world, as in the case of both protein functions and organismal phenotypes. However, applications for ontology-based annotations that go beyond simple knowledge organization, and that exploit sets of semantic descriptions, are puzzlingly rare. In particular, we argue that there is wide applicability, and a sore need, for tools that can satisfy the simple, common use case of identifying statistically improbable similarity between sets of semantic descriptions. Several metrics have been proposed for this task in the literature, but not yet fully evaluated, explored, and adopted. The requirements for semantic similarity tools tailored to sets of semantic descriptions would include speed, scalability to large numbers of sets, demonstrated statistical and biological validity, and ease of use.

Full Text

Duke Authors

Cited Authors

  • Vision, T; Blake, J; Lapp, H; Mabee, P; Westerfield, M

Cited Editors

  • Kauppinen, T; Pouchard, LC; Keßler, C

Published Date

  • 2011

Published In

  • Proceedings of the First International Workshop on Linked Science (Lisc 2011)

Volume / Issue

  • 783 /

Published By