A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx).

Journal Article (Journal Article)

To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.

Full Text

Duke Authors

Cited Authors

  • Vaidya, G; Cellinese, N; Lapp, H

Published Date

  • January 2022

Published In

Volume / Issue

  • 10 /

Start / End Page

  • e12618 -

PubMed ID

  • 35186448

Pubmed Central ID

  • PMC8855714

Electronic International Standard Serial Number (EISSN)

  • 2167-8359

International Standard Serial Number (ISSN)

  • 2167-8359

Digital Object Identifier (DOI)

  • 10.7717/peerj.12618


  • eng