Scholars@Duke publication: Naught all zeros in sequence count data are the same.

Naught all zeros in sequence count data are the same.

Publication , Journal Article

Silverman, JD; Roche, K; Mukherjee, S; David, LA

Published in: Comput Struct Biotechnol J

2020

Genomic studies feature multivariate count data from high-throughput DNA sequencing experiments, which often contain many zero values. These zeros can cause artifacts for statistical analyses and multiple modeling approaches have been developed in response. Here, we apply different zero-handling models to gene-expression and microbiome datasets and show models can disagree substantially in terms of identifying the most differentially expressed sequences. Next, to rationally examine how different zero handling models behave, we developed a conceptual framework outlining four types of processes that may give rise to zero values in sequence count data. Last, we performed simulations to test how zero handling models behave in the presence of these different zero generating processes. Our simulations showed that simple count models are sufficient across multiple processes, even when the true underlying process is unknown. On the other hand, a common zero handling technique known as "zero-inflation" was only suitable under a zero generating process associated with an unlikely set of biological and experimental conditions. In concert, our work here suggests several specific guidelines for developing and choosing state-of-the-art models for analyzing sparse sequence count data.

Duke Scholars

Author Lawrence Anthony David Molecular Genetics and Microbiology

Published In

Comput Struct Biotechnol J

DOI

10.1016/j.csbj.2020.09.014

ISSN

2001-0370

Publication Date

2020

Volume

Start / End Page

2789 / 2798

Location

Netherlands

Related Subject Headings

4601 Applied computing
3101 Biochemistry and cell biology
0802 Computation Theory and Mathematics
0103 Numerical and Computational Mathematics

Citation

APA

Chicago

ICMJE

MLA

NLM

Silverman, J. D., Roche, K., Mukherjee, S., & David, L. A. (2020). Naught all zeros in sequence count data are the same. Comput Struct Biotechnol J, 18, 2789–2798. https://doi.org/10.1016/j.csbj.2020.09.014

Silverman, Justin D., Kimberly Roche, Sayan Mukherjee, and Lawrence A. David. “Naught all zeros in sequence count data are the same.” Comput Struct Biotechnol J 18 (2020): 2789–98. https://doi.org/10.1016/j.csbj.2020.09.014.

Silverman JD, Roche K, Mukherjee S, David LA. Naught all zeros in sequence count data are the same. Comput Struct Biotechnol J. 2020;18:2789–98.

Silverman, Justin D., et al. “Naught all zeros in sequence count data are the same.” Comput Struct Biotechnol J, vol. 18, 2020, pp. 2789–98. Pubmed, doi:10.1016/j.csbj.2020.09.014.

Silverman JD, Roche K, Mukherjee S, David LA. Naught all zeros in sequence count data are the same. Comput Struct Biotechnol J. 2020;18:2789–2798.

Published In

Comput Struct Biotechnol J

DOI

10.1016/j.csbj.2020.09.014

ISSN

2001-0370

Publication Date

2020

Volume

Start / End Page

2789 / 2798

Location

Netherlands

Related Subject Headings

4601 Applied computing
3101 Biochemistry and cell biology
0802 Computation Theory and Mathematics
0103 Numerical and Computational Mathematics