5.5 Parallel computation

Multiple imputation is a parallel technique. If there are \(m\) processors available, it is possible to generate the \(m\) imputed datasets, estimate the \(m\) complete-data statistics and store the \(m\) results by \(m\) independent parallel streams. The overhead needed is minimal since each stream requires the same amount of processor time. If more than \(m\) processors are available, a better alternative is to subdivide each stream into several substreams. Huge savings in execution time can be obtained in this way (Beddo 2002).

Unfortunately, R is single-threaded, so the exploitation of the parallel nature of multiple imputation is not automatic, and requires some additional work. There are currently three alternatives to perform the calculation of mice in a parallel fashion.

  1. Gordon (2014) presents a fully worked out example code that builds upon the doParallel library, and that combines complete() and ibind(). With some programming this example can be adapted to other datasets.

  2. The parlMICE() function is a wrapper around mice() that can divide the imputations over multiple cores or CPUs. Schouten and Vink (2017) show that substantial gains are already possible with three free cores, especially for a combination of a large number of imputations \(m\) and a large sample size \(n\).

  3. The par.mice() function in the micemd package (Audigier and Resche-Rigon 2018) takes the same arguments as the mice() function, plus two extra arguments related to the parallel calculations. It also builds on the parallel package.

The last two options are quite similar. Application of these methods is especially beneficial for simulation studies, where the same model needs to be replicated a large number of times. Support for multi-core processing is likely to grow, so keep an eye on the Internet.