Gene-metabolite annotation with shortest reactional distance enhances metabolite genome-wide association studies results.
Studies combining metabolomics and genetics, known as metabolite genome-wide association studies (mGWAS), have provided valuable insights into our understanding of the genetic control of metabolite levels. However, the biological interpretation of these associations remains challenging due to a lack of existing tools to annotate mGWAS gene-metabolite pairs beyond the use of conservative statistical significance threshold. Here, we computed the shortest reactional distance (SRD) based on the curated knowledge of the KEGG database to explore its utility in enhancing the biological interpretation of results from three independent mGWAS, including a case study on sickle cell disease patients. Results show that, in reported mGWAS pairs, there is an excess of small SRD values and that SRD values and p-values significantly correlate, even beyond the standard conservative thresholds. The added-value of SRD annotation is shown for identification of potential false negative hits, exemplified by the finding of gene-metabolite associations with SRD ≤1 that did not reach standard genome-wide significance cut-off. The wider use of this statistic as an mGWAS annotation would prevent the exclusion of biologically relevant associations and can also identify errors or gaps in current metabolic pathway databases. Our findings highlight the SRD metric as an objective, quantitative and easy-to-compute annotation for gene-metabolite pairs that can be used to integrate statistical evidence to biological networks.