Making data-driven hypotheses for gene functions by integrating dependency, expression, and literature data
AbstractIdentifying the key functions of human genes is a major biomedical research goal. While some genes are well-studied, most human genes we know little about. New tools in data science -- a combination of computer programming, math & statistics, and topical expertise -- combined with the rapid adoption of open science and data sharing allow scientists to access publicly available datasets and interrogate these data before performing any experiments. We present here a new research tool called data-driven hypothesis (DDH) for predicting pathways and functions for thousands of genes across the human genome. Importantly, this method integrates gene essentiality, gene expression, and literature mining to identify candidate molecular functions or pathways of known and unknown genes. Beyond single gene queries, DDH can uniquely handle queries of defined gene ontology pathways or custom gene lists containing multiple genes. The DDH project holds tremendous promise to generate hypotheses, data, and knowledge in order to provide a deep understanding of the dynamic properties of mammalian genes. We present this tool via an intuitive online interface, which will provide the scientific community a platform to query and prioritize experimental hypotheses to test in the lab.
Digital Object Identifier (DOI)