Two parts are better than one: modeling marginal means of semicontinuous data
© 2017, Springer Science+Business Media New York (outside the USA). In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a continuous distribution with positive support. These are often analyzed using two-part mixtures that separately model the probability of use to account for the portion of the sample with zero values. Commonly, but not always, the second component models the continuous values conditional on them being positive. Prior work examining whether such two-part models are needed to appropriately draw inference from semicontinuous data compared to standard one-part regression models has found mixed results. However, prior studies have generally used only measures of model fit on a single dataset, leaving a definitive conclusion uncertain. This paper provides a detailed evaluation using simulations of the appropriateness of standard one-part generalized linear models (GLMs) compared to a recently developed marginalized two-part (MTP) model. The MTP model, unlike the one-part GLMs, explicitly accounts for the point mass at zero, yet takes the same form for the marginal mean as the commonly used GLM with log link, making the covariate effects directly comparable. We simulate data scenarios with varying sample sizes and percentages of zeros. One-part GLMs resulted in increased bias, lower than nominal coverage of confidence intervals, and inflated type I error rates, rendering them inappropriate for use with semicontinuous data. Even when distributional assumptions were violated, estimates of covariate effects and type I error rates under the MTP model remained robust.
Smith, VA; Neelon, B; Maciejewski, ML; Preisser, JS
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)