An imputation model for multilevel binary data
NEPS working paper 31
Bamberg, University of Bamberg, National Educational Panel Study (2013)
Missing data are a ubiquitous problem of almost all large-scale surveys, and it is also an issue to be addressed in the National Educational Panel Study (NEPS). Analyzing survey data without regarding missing data might cause invalid statistical inference. This is especially true if the process that creates the missing data is not a completely random one, i.e., is non-ignorable. If the probability of an observation being missing depends on observed measurements, the method of multiple imputation provides a remedy for adequately dealing with such situation. Its underlying idea is to replace missing values several times with plausible values. The resulting data sets are then analyzed separately, and the statistical results of the distinct analyses are subsequently combined into an overall result. A technique that has proven its value in this context is the method of multivariate imputation by chained equations. This technique demands a definition of a separate regression model for each incompletely observed variable. On the basis of the regression models thus defined, missing values are replaced by predicted ones. The main requisite for the feasibility of the method of multivariate imputation by chained equations is that the regression models applied be in accordance with the relationships prevalent in the data. The R package “mice” offers a comprehensive collection of relevant imputation models, for example, for continuous data. However, it is currently lacking an imputation model for multilevel binary data. This paper presents an accordant add-on function to enrich the toolbox of mice. The validity of this novel imputation function is shown using Monte Carlo simulations.
Keywords: multiple imputation, multivariate imputation by chained equations, imputation model, multi-level binary data