Modeling zero-modified count and semicontinuous data in health services research Part 1: background and overview.


Journal Article

Health services data often contain a high proportion of zeros. In studies examining patient hospitalization rates, for instance, many patients will have no hospitalizations, resulting in a count of zero. When the number of zeros is greater or less than expected under a standard count model, the data are said to be zero modified relative to the standard model. A similar phenomenon arises with semicontinuous data, which are characterized by a spike at zero followed by a continuous distribution with positive support. When analyzing zero-modified count and semicontinuous data, flexible mixture distributions are often needed to accommodate both the excess zeros and the typically skewed distribution of nonzero values. Various models have been introduced over the past three decades to accommodate such data, including hurdle models, zero-inflated models, and two-part semicontinuous models. This tutorial describes recent modeling strategies for zero-modified count and semicontinuous data and highlights their role in health services research studies. Part 1 of the tutorial, presented here, provides a general overview of the topic. Part 2, appearing as a companion piece in this issue of Statistics in Medicine, discusses three case studies illustrating applications of the methods to health services research. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text

Duke Authors

Cited Authors

  • Neelon, B; O'Malley, AJ; Smith, VA

Published Date

  • November 30, 2016

Published In

Volume / Issue

  • 35 / 27

Start / End Page

  • 5070 - 5093

PubMed ID

  • 27500945

Pubmed Central ID

  • 27500945

Electronic International Standard Serial Number (EISSN)

  • 1097-0258

Digital Object Identifier (DOI)

  • 10.1002/sim.7050


  • eng

Conference Location

  • England