Benchmarking association analyses of continuous exposures with RNA-seq in observational studies.
Journal Article (Journal Article;Multicenter Study)
Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.
Full Text
Duke Authors
Cited Authors
- Sofer, T; Kurniansyah, N; Aguet, F; Ardlie, K; Durda, P; Nickerson, DA; Smith, JD; Liu, Y; Gharib, SA; Redline, S; Rich, SS; Rotter, JI; Taylor, KD
Published Date
- November 5, 2021
Published In
Volume / Issue
- 22 / 6
PubMed ID
- 34015820
Pubmed Central ID
- PMC8574950
Electronic International Standard Serial Number (EISSN)
- 1477-4054
Digital Object Identifier (DOI)
- 10.1093/bib/bbab194
Language
- eng
Conference Location
- England