Interpretable Almost-Exact Matching for Causal Inference
Matching methods are heavily used in the social and health sciences due to their inter-pretability. We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. The method proposed in this work aims to match units on a weighted Ham-ming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant vari-ables as possible. To do this, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward clo-sure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of the units' optimization problems simultaneously. Notable advantages of our method over exist-ing matching procedures are its high-quality interpretable matches, versatility in handling different data distributions that may have ir-relevant variables, and ability to handle miss-ing data by matching on as many available covariates as possible.