Benchmarking association analyses of continuous exposures with RNA-seq in observational studies.

Journal Article (Journal Article;Multicenter Study)

Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.

Full Text

Duke Authors

Cited Authors

  • Sofer, T; Kurniansyah, N; Aguet, F; Ardlie, K; Durda, P; Nickerson, DA; Smith, JD; Liu, Y; Gharib, SA; Redline, S; Rich, SS; Rotter, JI; Taylor, KD

Published Date

  • November 5, 2021

Published In

Volume / Issue

  • 22 / 6

PubMed ID

  • 34015820

Pubmed Central ID

  • PMC8574950

Electronic International Standard Serial Number (EISSN)

  • 1477-4054

Digital Object Identifier (DOI)

  • 10.1093/bib/bbab194

Language

  • eng

Conference Location

  • England