6.7 Conclusion

Multiple imputation is not a quick automatic fix. Creating good imputations requires substantive knowledge about the data paired with some healthy statistical judgement. Impute close to the data. Real data are richer and more complex than the statistical models applied to them. Ideally, the imputed values should look like real data in every respect, especially if multiple models are to be fit on the imputed data. Keep the following points in mind:

  • Plan time to create the imputed datasets. As a rule of thumb, reserve for imputation 5% of the time needed to collect the data.

  • Check the modeling choices in Section 6.1. Though the software defaults are often reasonable, they may not work for the particular data.

  • Use MAR as a starting point using the strategy outlined in Section 6.2.

  • Choose the imputation methods and set the predictors using the strategies outlined in Section 6.3.

  • If the data contain derived variables that are not needed for imputation, impute the originals and calculate the derived variables afterward.

  • Use passive imputation if you need the derived variables during imputation. Carefully specify the predictor matrix to avoid feedback loops. See Section 6.4.

  • Monitor convergence of the MICE algorithm for aberrant patterns, especially if the rate of missing data is high or if there are dependencies in the data. See Sections 4.5.6 and 6.5.2.

  • Make liberal use of diagnostic graphs to compare the observed and the imputed data. Convince yourself that the imputed values could have been real data, had they not been missing. See Section 6.6.

Nguyen, Carlin, and Lee (2017) present a concise overview of methodologies to check the quality of the imputation model. In addition to the points mentioned above, the authors advice the application of posterior predictive checks, the evaluation of the effect of the target analysis when making judgements about model adequacy, and the application of a wide range of methodologies to check imputation models.