Missing data in the 2 x 2 table: patterns and likelihood-based analysis for cross-sectional studies with supplemental sampling.
Standard measures of crude association in the context of a cross-sectional study are the risk difference, relative risk and odds ratio as derived from a 2x 2 table. Most such studies are subject to missing data on disease, exposure, or both, introducing bias into the usual complete-case analysis. We describe several scenarios distinguished by the manner in which missing data arise, and for each we adjust the natural multinomial likelihood to properly account for missing data. The situations presented allow for increasing levels of generality with regard to the missing data mechanism. The final case, quite conceivable in epidemiologic studies, assumes that the probability of missing exposure depends on true exposure and disease status, as well as upon whether disease status is missing (and conversely for the probability of missing disease information). When parameters relating to the missing data process are inestimable without strong assumptions, we propose maximum likelihood analysis subsequent to collecting supplemental data in the spirit of a validation study. Analytical results give insight into the bias inherent in complete-case analysis for each scenario, and numerical results illustrate the performance of likelihood-based point and interval estimates in the most general case. Adjustment for potential confounders via stratified analysis is also discussed.
Volume / Issue
Start / End Page
Pubmed Central ID
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)