---
title: The iraceplot package
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The iraceplot package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r echo=FALSE, prompt=FALSE, message=FALSE, warning=FALSE}

library(iraceplot, quietly = TRUE)
library(plotly)
iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE))
```


## Introduction

The configuration process performed by irace will show at the end of the execution one
or more configurations that are the best performing configurations found. This package
provides a set of functions that allow to further assess the performance of these
configurations and provides support to obtain insights about the details of the configuration
process.

## Executing irace

To use the methods provided by this package you must have an irace data object, 
this object is saved as an Rdata file (irace.Rdata by default) after each irace 
execution. 

During the configuration procedure irace evaluates several candidate configurations 
(parameter settings) on different training insrances, creating an algorithm performance 
data set we call the *training data set*. This information is thus, the data
that irace had access to when configuring the algorithm. 

You can also enable the test evaluation option in irace, in which a set of elite 
configurations will be evaluated on a set of test instances after the execution of 
irace is finished. Nota that this option is not enabled by default and you
must provide the test instances in order to enable it. The performance obtained 
in this evalaution is called the *test data set*. This evaluation helps assess
the results of the configuration in a more "real" setup. For example, we can assess 
if the configuration process incurred in overtuning or if a type of instance
was underrepresented in the training set. We note that irace allows to perform
the test evaluations to the final elite configurations and to the elite configurations
of each iterations. For information about the irace setup we refer you to the irace 
package user guide. 

Note: Before executing irace, consider setting the test evaluation option of irace. 

Once irace is executed, you can load the irace log in the R console as previously 
shown.

# Function overview

## Visualizing configurations

The irace plot package provides several functions that display information about
configurations. For visualizing individual configurations the `parallel_coord`
shows each configuration as a line.

```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE}
parallel_coord(iraceResults)
```

The `plot_configurations()` function generates a similar parallel coordinates plot when
provided with an arbitrary set of configurations without the irace execution context. 
For example, to display all elite configurations:

```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE}
all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),]
plot_configurations(all_elite, iraceResults$scenario$parameters)
```

A similar display can be obtained using the `parallel_cat` function. For example to 
visualize the configurations of a selected set of parameters:

```{r fig.align="center", fig.width= 7, fig.height=6, message=FALSE, prompt=FALSE, eval=FALSE}
parallel_cat(irace_results = iraceResults, 
             param_names=c("algorithm", "localsearch", "dlb", "nnls"))
```

The `sampling_pie` function creates a plot that displays the values of all configurations
sampling during the configuration process. The size of each parameter
value in the plot is dependent of the number of configurations having that value in the
configurations.

```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE}
sampling_pie(irace_results = iraceResults, param_names=c("algorithm", "localsearch", "alpha", "beta", "rho"))
```

Note that for some of the previous plots, numerical parameters domains are discretized 
to be showm in the plot. Check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/)
to adjust this setting.

## Visualising sampled values and frequencies

The package provides several functions to visualize values sampled during the 
configuration procedure and their distributions. These plots can help identifying 
the areas in the parameter space where irace detected a high performance.

A general overview of the sampled parameters values can be obtained with the `sampling_frequency`
function which generates frequency and density plots for the sampled values:

```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE}
 sampling_frequency(iraceResults, param_names = c("beta"))
```


If you would like to visualize the distribution of a particular set of configurations,
you can pass directly a set of configurations 
and a parameters object in the irace format to the `sampling_frequency` function:
```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE, eval=FALSE}
 sampling_frequency(iraceResults$allConfigurations, iraceResults$scenario$parameters, param_names = c("alpha"))
```

A detailed plot showing the sampling by iteration can be obtained with the
`sampling_frequency_iteration` function. This plot shows the convergence of the
configuration process reflected in the sampled parameter values.

```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE}
sampling_frequency_iteration(iraceResults, param_name = "beta")
```

To visualize the joint sampling frequency of two parameters you can use the `sampling_heatmap`
function. 

```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE}
sampling_heatmap(iraceResults, param_names = c("beta","alpha"))
```

The configurations can be provided directly to the `sampling_heatmap2` function. In both
functions, the parameter sizes can be used to adjust the number of intervals to be
displayed:

```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE}
sampling_heatmap2(iraceResults$allConfigurations, iraceResults$scenario$parameters, 
                  param_names = c("localsearch","nnls"), sizes=c(0,5))
```

For more details of these functions, check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/).

## Visualizing sampling distance

You may like to have a general overview of the distance of the configurations sampled
across the configuration process. This can allow you to assess the convergence of the
configuration process. Use the `sampling_distance` function to display the mean distance
of the configurations across the iterations of the configuration process:

```{r fig.align="center", fig.width= 7, eval=FALSE}
sampling_distance(iraceResults, t=0.05)
```

Numerical parameter distance can be adjusted with a treshold (`t=0.05`), 
check the documentation of the function and the [user guide](https://auto-optimization.github.io/iraceplot/) 
for details.


## Visualizing  test performance 

The test performance of the best final configurations can be visualized using the `boxplot_test`
function. 

```{r fig.align="center", fig.width=7, eval=FALSE}
boxplot_test(iraceResults, type="best")
```

Note that the irace execution log includes test data (test is not enabled by default), check the 
irace package [user guide](https://CRAN.R-project.org/package=irace/vignettes/irace-package.pdf)
for details on how to use the test feature in irace.

To investigate the difference in the performance of two configurations the `scatter_test` function displays
the performance of both configurations paired by instance (each point represents an instance):

```{r fig.align="center", fig.width=7, eval=FALSE}
scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE)
```


## Visualizing training performance 

Visualizing training performance might help to obtain insights about the reasoning
that followed irace when searching the parameter space, and thus it can be used
to understand why irace considers certain configurations as high or low  performing.

To visualize the performance of the final elites observed by irace, the `boxplot_training`
function plots the experiments performed on these configurations. Note that this data corresponds
to the performance generated during the configuration process thus, the number of instances on
which the configurations were evaluated might vary between elite configurations.

```{r fig.align="center", fig.width=7, eval=FALSE}
boxplot_training(iraceResults)
```

To observe the difference in the performance of two configurations you can also generate
a scatter plot using the `scatter_training` function: 

```{r fig.align="center", fig.width=7, eval=FALSE}
scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE)
```

## Visualizing performance (general purpose)
To plot the performance of a selected set of configurations in an experiment matrix, 
you can use the `boxplot_performance` function. The configurations can be selected
in a vector or a list (allElites):

```{r fig.align="center", fig.width=7, eval=FALSE}
boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE)
```

In the same way, you can use the `scatter_perfomance` function to plot the difference
between configurations:

```{r fig.align="center", fig.width=7, eval=FALSE}
scatter_performance(iraceResults$experiments, x_id = 83, y_id = 809, interactive=TRUE)
```

Note that there these functions can be adjusted to display differently the configurations (i.e. include or not instancs).
Check the package [user guide](https://auto-optimization.github.io/iraceplot/) and the 
documentation of each function for details.

## Visualizing the configuration process

In some cases, it might be interesting have a general visualization for the configuration process.
This can be obtained with the `plot_experiments_matrix` function:

```{r fig.align="center", fig.width=7, eval=FALSE}
plot_experiments_matrix(iraceResults, interactive = TRUE)
```

The sampling distributions used by irace during the configuration process can be displayed using the 
`plot_model` function. For categorical parameters, this function displays the sampling probabilities associated to each parameter value by iteration (x axis top) in each elite configuration model (bars):

```{r fig.align="center", fig.width=7, eval=FALSE}
plot_model(iraceResults, param_name="algorithm")
```

For numerical parameters, this function shows the sampling distributions associated to each parameter.
These plots display the the density function of the truncated normal distribution associated to the 
models of each elite configuration in each instance:


```{r fig.align="center", fig.width=7, fig.height=6, message=FALSE, prompt=FALSE, results='hide', eval=FALSE}
plot_model(iraceResults, param_name="alpha")
```

# Report

The `report` function generates an HTML report with a summary of the configuration 
process executed by irace. The function will create an HTML file in the path
provided in the `filename` argument and appending the `".html"` extension to it.


```{r fig.align="center", eval=FALSE}
report(iraceResults, filename="report")
```