---
title: "Precision-Recall and Receiver Operator Characteristics Curves"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{vignette_ROC-PRC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  message  = FALSE,
  warning  = FALSE
)
```

```{r setup}
library(SLmetrics)
```

In this vignette a worked example on creating Precision-Recall and Receiver Operator Characteristics curves are provided. Throughout this vignette the [Banknote Authentication](https://archive.ics.uci.edu/dataset/267/banknote+authentication)-dataset is used. The `banknote`-dataset is a part of {SLmetrics} which is `list` of features and targets, and can be called as follows:

```{r}
# 1) load data into namespace
data(
    banknote,
    package = "SLmetrics"
)
```

The `banknote` dataset classification tasks achieves between 95\% and 99\% accuracy and therefore makes a bad case for demonstrating Precision-Recall and Receiver Operator Characteristics curves. To alleviate this, random noise will be injected to the original dataset as follows:

```{r}
# 1) set seed
set.seed(1903)

# 2) extract indices
# for shuffling
noise <- sample(
    x = 1:nrow(banknote$features),
    size = nrow(banknote$features) * 0.50
)

# 3) reshuffle
# features and target
noise <- cbind(
    banknote$features[sample(noise),],
    target = banknote$target[sample(noise)]
)
```

The `data.frame` is constructed as follows:

```{r}
# 1) convert to data.frame
# and head
head(
    banknote <- cbind(
        banknote$features,
        target = banknote$target
        )
)

# 2) introduce random
# noise to the data
# NOTE: wrapped in `try()` in case 
# noise is removed
try(
    expr = {
        banknote <- rbind(
        banknote,
        noise
    )
    },
    silent = TRUE
)

# 3) convert target to binary
# value
banknote$target <- as.numeric(
    banknote$target == "inauthentic"
)
```

## Authentic or inauthentic banknote

To predict whether the banknotes are authentic or inauthentic a logistic regression will be trained on a training sample, and evaluated on a the test sample.

### Training/Test split

To train and test test the model a training/test split with 80\% and 20\%.

```{r}
# 1) set seed
set.seed(1903)

# 2) generate indices
index <- sample(
    x = 1:nrow(banknote),
    size = nrow(banknote) * 0.80
)

# 3) split data
# 3.1) training
train <- banknote[index,]
test  <- banknote[-index,]
```

### Training the logistic regression

```{r}
# 1) train the logistic
# regression
model <- glm(
    formula = target ~ .,
    data    = train,
    family  = binomial(
        link = "logit"
    ) 
)
```

## Evaluate Performance

To evaluate the performance we will extract the response probabilities

```{r}
# 1) extract class
# probabilites
class_probabilities <- predict(
    object  = model,
    newdata = subset(test, select = -target),
    type    = "response"
)

# 2) calculate class
class_probabilities <- as.matrix(
    cbind(
        class_probabilities,
        1 - class_probabilities
    )
)
```

### Visualize Precision-Recall Curve

```{r}
# 1) create actual
# value
actual <- factor(
    x = test$target,
    levels = c(1, 0),
    labels = c("inauthentic", "authentic")
)
```


```{r}
# 1) construct precision-recall 
# object
print(
    precision_recall <- prROC(
        actual   = actual,
        response = class_probabilities
    )
)
```

The Precision-Recall object can be visualized by using `plot()`

```{r}
plot(
    precision_recall
)
```

```{r}
pr.auc(
    actual   = actual,
    response = class_probabilities
)
```


### Visualize Receiver Operator Characteristics Curve

```{r}
# 1) construct Receiver Operator Characteristics 
# object
print(
    receiver_operator_characteristics <- ROC(
        actual   = actual,
        response = class_probabilities
    )
)
```

The Receiver Operator Characteristics object can be visualized by using `plot()`

```{r}
plot(
    receiver_operator_characteristics
)
```

```{r}
roc.auc(
    actual   = actual,
    response = class_probabilities
)
```