---
title: "Package MKclass"
author: "Matthias Kohl"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{MKclass}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{utf8}
---


## Introduction  
Package MKclass includes a collection of functions that I found useful for 
statistical classification.

We first load the package.
```{r}
library(MKclass)
```


## OR, RR and Other Risk Measures
Given the incidence of the outcome of interest in the nonexposed (p0) and 
exposed (p1) group, several risk measures can be computed.
```{r}
## Example from Wikipedia
risks(p0 = 0.4, p1 = 0.1)
risks(p0 = 0.4, p1 = 0.5)
```

Given p0 or p1 and OR, we can compute the respective RR.
```{r}
or2rr(or = 1.5, p0 = 0.4)
or2rr(or = 1/6, p1 = 0.1)
```

There is also a function for computing an approximate confidence interval for 
the relative risk (RR).
```{r}
## Example from Wikipedia
rrCI(a = 15, b = 135, c = 100, d = 150)
rrCI(a = 75, b = 75, c = 100, d = 150)
```


## AUC
### Estimation
There are two functions that can be used to calculate and test AUC values. First
function AUC, which computes the area under the receiver operating characteristic 
curve (AUC under ROC curve) using the connection of AUC to the Wilcoxon rank sum 
test. We use some random data and groups to demonstrate the use of this function.
```{r}
x <- c(runif(50, max = 0.6), runif(50, min = 0.4))
g <- c(rep(0, 50), rep(1, 50))
AUC(x, group = g)
```

Sometimes the labels of the group should be switched to avoid an AUC smaller 
than 0.5, which represents a result worse than a pure random choice.
```{r}
g <- c(rep(1, 50), rep(0, 50))
AUC(x, group = g)
## no switching
AUC(x, group = g, switchAUC = FALSE)
```


### Testing
We can also perform statistical tests for AUC. First, the one-sample test which
corresponds to the Wilcoxon signed rank test.
```{r}
g <- c(rep(0, 50), rep(1, 50))
AUC.test(pred1 = x, lab1 = g)
```

We can also compare two AUC using the test of Hanley and McNeil (1982).
```{r}
x2 <- c(runif(50, max = 0.7), runif(50, min = 0.3))
g2 <- c(rep(0, 50), rep(1, 50))
AUC.test(pred1 = x, lab1 = g, pred2 = x2, lab2 = g2)
```


### Pairwise
There is also a function for pairwise comparison if there are more than two 
groups.
```{r}
x3 <- c(x, x2)
g3 <- c(g, c(rep(2, 50), rep(3, 50)))
pairwise.auc(x = x3, g = g3)
```


## PPV and NPV
In case of medical diagnostic tests, usually sensitivity and specificity of
the tests are known and there is also at least a rough estimate of the 
prevalence of the tested disease. In the practival application, the positive
predictive value (PPV) and the negative predictive value are of crucial 
importance.
```{r}
## Example: HIV test 
## 1. ELISA screening test (4th generation)
predValues(sens = 0.999, spec = 0.998, prev = 0.001)
## 2. Western-Plot confirmation test
predValues(sens = 0.998, spec = 0.999996, prev = 1/3)
```


## Performance Measures and Scores
In the development of diagnostic tests and more general in binary classification
a variety of performance measures and scores can be found in literature. Functions 
perfMeasures and prefScores compute several of them.
```{r}
## example from dataset infert
fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
pred <- predict(fit, type = "response")

## with group numbers
perfMeasures(pred, truth = infert$case, namePos = 1)
perfScores(pred, truth = infert$case, namePos = 1)

## with group names
my.case <- factor(infert$case, labels = c("control", "case"))
perfMeasures(pred, truth = my.case, namePos = "case")
perfScores(pred, truth = my.case, namePos = "case")

## using weights
perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3)
perfScores(pred, truth = infert$case, namePos = 1, wBS = 0.3)
```

## Optimal Cutoff
The function optCutoff computes the optimal cutoff for various performance 
measures for binary classification. More precisely, all performance measures 
that are implemented in function perfMeasures.
```{r}
## example from dataset infert
fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
pred <- predict(fit, type = "response")
optCutoff(pred, truth = infert$case, namePos = 1)
```
The computation of an optimal cut-off doesn't make any sense for continuous 
scoring rules as their computation does not involve any cut-off 
(discretization/dichotomization).

## Hosmer-Lemeshow and le Cessie-van Houwelingen-Copas-Hosmer
These tests are used to investigate the goodness of fit in logistic regression.
```{r}
## Hosmer-Lemeshow goodness of fit tests for C and H statistic 
HLgof.test(fit = pred, obs = infert$case)
## e Cessie-van Houwelingen-Copas-Hosmer global goodness of fit test
HLgof.test(fit = pred, obs = infert$case, 
           X = model.matrix(case ~ spontaneous+induced, data = infert))
```


## sessionInfo
```{r}
sessionInfo()
```