---
title: "Parallel computation of interpretation methods"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Parallel computation of interpretation methods}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---
  
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = T, comment = "#>", fig.width = 7, fig.height = 7, fig.align = "center")
options(tibble.print_min = 4L, tibble.print_max = 4L)
```

The `iml` package can now handle bigger datasets. 
Earlier problems with exploding memory have been fixed for `FeatureEffect`, `FeatureImp` and `Interaction`.
It's also possible now to compute `FeatureImp` and `Interaction` in parallel.
This document describes how.

First we load some data, fit a random forest and create a Predictor object.

```{r}
set.seed(42)
library("iml")
library("randomForest")
data("Boston", package = "MASS")
rf <- randomForest(medv ~ ., data = Boston, n.trees = 10)
X <- Boston[which(names(Boston) != "medv")]
predictor <- Predictor$new(rf, data = X, y = Boston$medv)
```

## Going parallel

Parallelization is supported via the {future} package.
All you need to do is to choose a parallel backend via `future::plan()`.

```{r}
library("future")
library("future.callr")
# Creates a PSOCK cluster with 2 cores
plan("callr", workers = 2)
```

Now we can easily compute feature importance in parallel. This means that the computation per feature is distributed among the 2 cores I specified earlier.

```{r}
imp <- FeatureImp$new(predictor, loss = "mae")
library("ggplot2")
plot(imp)
```

That wasn't very impressive, let's actually see how much speed up we get by parallelization.

```{r}
bench::system_time({
  plan(sequential)
  FeatureImp$new(predictor, loss = "mae")
})
bench::system_time({
  plan("callr", workers = 2)
  FeatureImp$new(predictor, loss = "mae")
})
```

A little bit of improvement, but not too impressive.
Parallelization is more useful in the case where the model uses a lot of features or where the feature importance computation is repeated more often to get more stable results.

```{r}
bench::system_time({
  plan(sequential)
  FeatureImp$new(predictor, loss = "mae", n.repetitions = 10)
})

bench::system_time({
  plan("callr", workers = 2)
  FeatureImp$new(predictor, loss = "mae", n.repetitions = 10)
})
```

### Interaction

Here the parallel computation is twice as fast as the sequential computation of the feature importance.

The parallelization also speeds up the computation of the interaction statistics:

```{r}
bench::system_time({
  plan(sequential)
  Interaction$new(predictor, grid.size = 15)
})
bench::system_time({
  plan("callr", workers = 2)
  Interaction$new(predictor, grid.size = 15)
})
```

### Feature Effects

Same for `FeatureEffects`:

```{r}
bench::system_time({
  plan(sequential)
  FeatureEffects$new(predictor)
})
bench::system_time({
  plan("callr", workers = 2)
  FeatureEffects$new(predictor)
})
```