Skip to main content
Journal cover image

Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression.

Publication ,  Journal Article
Subramaniyam, S; DeJesus, MA; Zaveri, A; Smith, CM; Baker, RE; Ehrt, S; Schnappinger, D; Sassetti, CM; Ioerger, TR
Published in: BMC Bioinformatics
November 21, 2019

BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

BMC Bioinformatics

DOI

EISSN

1471-2105

Publication Date

November 21, 2019

Volume

20

Issue

1

Start / End Page

603

Location

England

Related Subject Headings

  • Mycobacterium tuberculosis
  • Models, Statistical
  • Mice, Inbred C57BL
  • Linear Models
  • Likelihood Functions
  • High-Throughput Nucleotide Sequencing
  • Genes, Essential
  • Databases, Genetic
  • DNA Transposable Elements
  • Bioinformatics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Subramaniyam, S., DeJesus, M. A., Zaveri, A., Smith, C. M., Baker, R. E., Ehrt, S., … Ioerger, T. R. (2019). Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression. BMC Bioinformatics, 20(1), 603. https://doi.org/10.1186/s12859-019-3156-z
Subramaniyam, Siddharth, Michael A. DeJesus, Anisha Zaveri, Clare M. Smith, Richard E. Baker, Sabine Ehrt, Dirk Schnappinger, Christopher M. Sassetti, and Thomas R. Ioerger. “Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression.BMC Bioinformatics 20, no. 1 (November 21, 2019): 603. https://doi.org/10.1186/s12859-019-3156-z.
Subramaniyam S, DeJesus MA, Zaveri A, Smith CM, Baker RE, Ehrt S, et al. Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression. BMC Bioinformatics. 2019 Nov 21;20(1):603.
Subramaniyam, Siddharth, et al. “Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression.BMC Bioinformatics, vol. 20, no. 1, Nov. 2019, p. 603. Pubmed, doi:10.1186/s12859-019-3156-z.
Subramaniyam S, DeJesus MA, Zaveri A, Smith CM, Baker RE, Ehrt S, Schnappinger D, Sassetti CM, Ioerger TR. Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression. BMC Bioinformatics. 2019 Nov 21;20(1):603.
Journal cover image

Published In

BMC Bioinformatics

DOI

EISSN

1471-2105

Publication Date

November 21, 2019

Volume

20

Issue

1

Start / End Page

603

Location

England

Related Subject Headings

  • Mycobacterium tuberculosis
  • Models, Statistical
  • Mice, Inbred C57BL
  • Linear Models
  • Likelihood Functions
  • High-Throughput Nucleotide Sequencing
  • Genes, Essential
  • Databases, Genetic
  • DNA Transposable Elements
  • Bioinformatics