Skip to main content

SparkScore: Leveraging apache spark for distributed genomic inference

Publication ,  Conference
Bahmani, A; Sibley, AB; Parsian, M; Owzar, K; Mueller, F
Published in: Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
July 18, 2016

The method of the efficient score statistic is used extensively to conduct inference for high throughput genomic data due to its computational efficiency and abilityto accommodate simple and complex phenotypes. Inference based on these statistics can readily incorporate a priori knowledge from a vast collection of bioinformatics databases to further refine the analyses. The sampling distribution of the efficient score statistic is typically approximated using asymptotics. As this may be inappropriate in the context of small study size, or uncommon or rare variants, resampling methods are often used to approximate the exact sampling distribution. We propose SparkScore, a set of distributed computational algorithms implemented in Apache Spark, to leverage the embarrassingly parallel nature of genomic resampling inference on the basis of the efficient score statistics. We illustrate the application of this computational approachfor the analysis of data from genome-wide analysis studies(GWAS). This computational approach also harnesses thefault-tolerant features of Spark and can be readily extended to analysis of DNA and RNA sequencing data, including expression quantitative trait loci (eQTL) and phenotype association studies.

Duke Scholars

Published In

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

DOI

ISBN

9781509021406

Publication Date

July 18, 2016

Start / End Page

435 / 442
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Bahmani, A., Sibley, A. B., Parsian, M., Owzar, K., & Mueller, F. (2016). SparkScore: Leveraging apache spark for distributed genomic inference. In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 (pp. 435–442). https://doi.org/10.1109/IPDPSW.2016.6
Bahmani, A., A. B. Sibley, M. Parsian, K. Owzar, and F. Mueller. “SparkScore: Leveraging apache spark for distributed genomic inference.” In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, 435–42, 2016. https://doi.org/10.1109/IPDPSW.2016.6.
Bahmani A, Sibley AB, Parsian M, Owzar K, Mueller F. SparkScore: Leveraging apache spark for distributed genomic inference. In: Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016. 2016. p. 435–42.
Bahmani, A., et al. “SparkScore: Leveraging apache spark for distributed genomic inference.” Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, 2016, pp. 435–42. Scopus, doi:10.1109/IPDPSW.2016.6.
Bahmani A, Sibley AB, Parsian M, Owzar K, Mueller F. SparkScore: Leveraging apache spark for distributed genomic inference. Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016. 2016. p. 435–442.

Published In

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

DOI

ISBN

9781509021406

Publication Date

July 18, 2016

Start / End Page

435 / 442