---
title: "Clarke Wright Performance Benchmarks"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Clarke Wright Performance Benchmarks}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%",
  dpi = 300
)
```

```{r setup, include = FALSE}
library(heumilkr)
library(ggplot2)
library(ggExtra)

result <- readRDS("perf.rds")

rho_mean <- round(100 * mean(result$clarke_wright_perf_rho), 1)
xi_mean <- round(100 * mean(result$clarke_wright_perf_xi), 1)

description <-
  data.frame(
    group = c("A", "B", "E", "F", "tai"),
    group_desc = c(
      "Augerat A, 1995", "Augerat B, 1995",
      "Christofides and Eilon, 1969", "Fisher, 1994",
      "Rochatand Taillard, 1995"
    )
  )
```

In this vignette we discuss performance benchmarks of the Clarke-Wright algorithm implemented in this package by comparing the Clarke-Wright solutions of a set of problem instances with their optimal value.

### The problem instance data

The problem instances we are basing our measurements on are provided courtesy of [CVRPLIB](http://vrp.atd-lab.inf.puc-rio.br/) and can be divided into five groups[^1] of different characteristics:

[^1]: The naming is directly taken from [CVRPLIB](http://vrp.atd-lab.inf.puc-rio.br/)

-   Augerat A, 1995 (27 instances)

-   Augerat B, 1995 (23 instances)

-   Christofides and Eilon, 1969 (13 instances)

-   Fisher, 1994 (3 instances)

-   Rochat and Taillard, 1995 (12 instances)

The data provides the problem instance as well as the optimal solution.

## Relation to optimal and trivial solution

The idea of this indicator is to measure where the Clarke-Wright solution sits in between the optimal solution and the trivial solution[^2]. Let $\gamma$ be the cost of the Clarke-Wright solution, $\omega$ the cost of the optimal solution, and $\alpha$ the cost of the trivial solution. The measure $\xi$ is defined to be $$\xi := 1 - \frac{\gamma - \omega}{\alpha - \omega}\,.$$ The measure can move between zero and 1. It is zero if the solution is the trivial solution, and it is one if the solution is the optimal solution.

[^2]: By trivial solution we mean the solution where each site is assigned exactly one route (which goes back and forth).

Evaluating this for CVRPLIB sample data yields the following graph.

```{r perf_scale_based_graph, echo=FALSE}
ggMarginal(
  merge(
    result,
    description,
    by = "group"
  ) |>
    ggplot(aes(x = n_site, y = clarke_wright_perf_xi,
               color = group_desc)) +
    geom_point(alpha = 0.6, size = 3) +
    scale_y_continuous(
      name = "Optimality vs. triviality",
      labels = scales::label_percent(),
      position = "left"
    ) +
    scale_x_continuous(
      name = "Number of demanding sites",
      position = "bottom"
    ) +
    scale_colour_discrete(name = "CVRPLIB data set") +
    theme_bw() +
    theme(legend.position = "bottom") +
    guides(color = guide_legend(nrow = 2)),
  type = "boxplot",
  margins = "y",
  list(alpha = 0.6),
  groupFill = TRUE
)

```

The mean value over all problem instances is $\overline{\xi}=`r xi_mean`\%$.

## Relative deviation of optimal solution

We can also simply measure the relative deviation of the Clarke-Wright cost to the optimal cost, $$\rho := \frac{\gamma - \omega}{\gamma}\,,$$ which measures how much savings we miss by using the Clarke-Wright solution instead of the optimal solution.

```{r perf_rel_graph, echo = FALSE}
ggMarginal(
  merge(
    result,
    description,
    by = "group"
  ) |>
    ggplot(aes(x = n_site, y = clarke_wright_perf_rho, color = group_desc)) +
    geom_point(alpha = 0.6, size = 3) +
    scale_y_continuous(
      name = "Relative loss of savings",
      labels = scales::label_percent()
    ) +
    scale_x_continuous(
      name = "Number of demanding sites",
      position = "bottom"
    ) +
    scale_colour_discrete(name = "CVRPLIB data set") +
    theme_bw() +
    theme(legend.position = "bottom") +
    guides(color = guide_legend(nrow = 2)),
  type = "boxplot",
  margins = "y",
  list(alpha = 0.6),
  groupFill = TRUE
)
```

The mean value over all problem instances is $\overline{\rho}=`r rho_mean`\%$, i.e. on average, we miss out on around $`r rho_mean`\%$ savings taking the Clarke-Wright solution compared to the optimal one.