---
title: "ROCit: An R Package for Performance Assessment of Binary Classifier with Visualization"
author: "Md Riaz Ahmed Khan"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
fig_caption: yes
bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{ROCit: An R Package for Performance Assessment of Binary Classifier with Visualization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```





# Introduction

Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use. 

_______


# Binary Classifier

In statistics and machine learning arena, classification is a problem of labeling
an observation from a finite number of possible classes. Binary classification
is a special case of classification problem, where the number of possible labels
is two. It is a task of labeling an observation from two possible labels. The
dependent variable represents one of two conceptually opposed values (often coded with 0 and 1), for
example:

* the outcome of an experiment- pass (1) or fail (0)
* the response of a question- yes (1) or no (0)
* presence of some feature- absent (0) or present (1)

There are many algorithms that can be used to predict binary response.
Some of the widely used techniques are logistic regression,
discriminant analysis, Naive Bayes classification, decision tree, random forest, neural network, support vector  machines [@james2013introduction], etc. In general, the algorithms model the probability of one of the two events to occur, for the certain values of the covariates, which in mathematical terms can be expressed as $Pr(Y=1|X_1=x_1, X_2=x_2,\dots,X_n=x_n)$. Certain threshold can then be applied to convert the probabilities into classes. 


# Binary Classifier Performance Metrics

## Hard Classification

When hard classification are made, (after converting the probabilities using threshold or returned by the algorithm), there can be four cases for a certain observation:

1. The response actually negative,  the algorithm predicts it to be
negative. This is known as true negative (TN).

2. The response actually negative,  the algorithm predicts it to be
positive. This is known as false positive (FP).

3. The response actually positive,  the algorithm predicts it to be
positive. This is known as true positive (TP).

4. The response  actually positive,  the algorithm predicts it to be
negative. This is known as false negative (FN).

All the observations fall into one of the four categories stated above and form a confusion matrix.

|                     | Predicted Negative (0) | Predicted Positive (1) |
|---------------------|------------------------|:----------------------:|
| **Actual Negative (0)** |   True Negative (TN)   |   False Positive (FP)  |
| **Actual Positive (1)** |   False Negative (FN)  |   True Positive (TP)   |


Following are some popular performance metrics, when observations are hard classified:

* **Misclassification:** 
Misclassification rate, or error rate is the most common metric used to quantify a binary classifier. This is the probability that the classifier makes a
wrong prediction, which can be expressed as:
$$
Misclassification\ rate=Pr(\hat{Y}\neq Y)=\frac{FN+FP}{TN+FN+TP+FP}
$$


* **Accuracy:** This simply accounts for the number of correct classifications made.
$$
Accuracy = Pr(\hat{Y}=Y)=1-Misclassification\ rate
$$



* **Sensitivity:** Sensitivity measures the proportion of the positive responses that are correctly identified as positive by the classifier [@altman1994diagnostic]. In other words, it is the
true positive rate and can be calculated directly from the entries of confusion matrix.
$$
Sensitivity=Pr(\hat{Y}=1|Y=1)=\frac{TP}{TP+FN}
$$
Other terms used to represent the same metric are true positive rate (TPR),
recall. The term sensitivity is popular in medical test [@altman1994statistics]. In the credit world
the use of the term TPR has been noticed [@siddiqi2012credit]. 
While in machine learning,
natural language processing, use of recall is common [@nguyen2006training; @denecke2008using; @bermingham2011using, @huang2009analyzing]


* **Specificity:** Specificity measures the proportion of the negative responses that are correctly identified as negative by the classifier [@altman1994diagnostic]. In other words, it is the
true negative rate and can be calculated directly from the entries of confusion
matrix.
$$
Specificity=Pr(\hat{Y}=0|Y=0)=\frac{TN}{TN+FP}
$$

Specificity is also known as true negative rate (TNR).



* **Positive predictive value (PPV):** Positive predictive value (PPV) is the probability that an observation classified as positive is truly positive. It  can be calculated from the entries of confusion matrix:
$$
PPV=Pr(Y=1|\hat{Y}=1)=\frac{TP}{TP+FP}
$$


* **Negative predictive value (NPV):** Negative predictive value (NPV) is the probability that an observation classified as negative is truly negative. It can be calculated from the entries of confusion matrix.
$$
NPV=Pr(Y=0|\hat{Y}=0)=\frac{TN}{TN+FN}
$$

* **Diagnostic likelihood ratio (DLR):** Likelihood ratio is another form of accuracy measure of a binary classifier.
This is a ratio. From true statistical sense, this ratio is a likelihood ratio,
but in the context of accuracy measure, this is called diagnostic likelihood
ratio (DLR) [@pepe2003statistical]. There are two kinds of DLR metrics are defined:

$$Positive\ DLR=\frac{TPR}{FPR}$$
$$Negative\ DLR=\frac{TNR}{FNR}$$


* **F-Score:** F-Score (also known as F-measure, F1
-Score) is another metric which is used
to assess the performance of a binary classifier. It is often used in information theory, to assess search, document classification, and query classification
performance [@beitzel2007temporal]. It is defines as the harmonic mean of precision (Positive
predictive value, PPV) and recall (True positive rate, TPR).

$$
	F\text{-}Score=\frac{2}{\frac{1}{PPV} +\frac{1}{TPR}}=2\times \frac{PPV\times TPR}{PPV+TPR}
$$

## Observation Are Scored

Rather than making simple classification,  often 
models give probability scores, $Pr(Y=1)$. Using certain cutoff or threshold values, we can dichotomize the scores and
calculate these metrics. This is also true when some certain diagnostic variable is used to categorize the observations. For example, having a hemoglobin A1c level of lower than 6.5\%  being treated as no diabetes, and having a level equal to greater than 6.5\% being treated as having the disease. Here the diagnostic measure is not bound in between 0 and 1 like the probability measure, yet all the metrics stated above can be derived. But these metrics give a sense of performance measure
only at certain threshold. There are metrics, that measure overall performance of the binary classifier
considering the performance at all possible thresholds. Two such metrics are 

1. Area under receiver operating characteristic (ROC) curve
2. KS statistic



Receiver operating characteristic (ROC) curve [@lusted1971decision, @hanley1982meaning, @bewick2004statistics] is a simple yet
powerful tool used to evaluate a binary classifier quantitatively. The most
common quantitative measure is the area under the curve [@hanley1982meaning]. ROC curve is drawn by plotting the sensitivity (TPR)
along $Y$ axis and corresponding 1-specificity (FPR) along $X$ axis for all possible cutoff values. Mathematically, it is the set of all ordered pairs 
$(FPR(c), TPR(c))$, where $c\in R$.


**Some Properties of ROC curve**

* ROC curve is a monotonically increasing function, defined in the $(+,+)$ quadrant.


* ROC curve is that it is invariant of strictly increasing transformation of the diagnostic variable, when estimated empirically.


* The ROC curve always contains $(0,0)$ and $(1,1)$. These are the extreme points when the threshold is set to $+\infty$ and $-\infty$. 


If the diagnostic variable  is unrelated with the binary outcome, the expected ROC curve is simply the $y=x$ line. In a situation where the diagnostic variable can perfectly separate the two
classes, the ROC curve consists of a vertical line ($x=0$) and a horizontal line ($y=1$). For a practical data, usually the ROC stays in between these two extreme scenarios. Figure below illustrates some examples of different types of ROC curves.
The red and the green curves illustrate two extreme scenarios. The random
line in red is the expected ROC curve when the diagnostic variable does not
have any predictive power. When the observations are perfectly separable,
the ROC curve consists of one horizontal and a vertical line as shown in
green. The other curves are the result of typical practical data. When the
curve shifts more to the north-west, it means better the predictive power.

```{r ROC1, echo=FALSE,fig.width=6,fig.height=4,fig.cap="ROC curves example"}
library(ROCit)
class=c(rep(1,50), rep(0,50))
set.seed(1)
score=c(rnorm(50,50,10), rnorm(50,34,10))
r1=rocit(score, class, method = "bin")
set.seed(1)
score=c(rnorm(50,50,10), rnorm(50,39,10))
r2=rocit(score, class, method = "bin")
set.seed(1)
score=c(rnorm(50,50,10), rnorm(50,44,10))
r3=rocit(score, class, method = "bin")



plot(r1$TPR~r1$FPR, type = "l", xlab = "1 - Specificity (FPR)", lwd = 2,
     ylab = "Sensitivity (TPR)", col= "gold4")
grid()
lines(r2$TPR~r2$FPR, lwd = 2, col = "dodgerblue4")
lines(r3$TPR~r3$FPR, lwd = 2, col = "orange")
abline(0,1, col = 2, lwd = 2)
segments(0,0,0,1, col = "darkgreen", lwd = 2)
segments(1,1,0,1, col = "darkgreen", lwd = 2)
arrows( 0.3, 0.4, 0.13, 0.9, length = 0.25, angle = 30,
       code = 2, lwd = 2)
text(0.075, 0.88, "better")


legend("bottomright", c("Perfectly Separable", 
                        "ROC 1", "ROC 2", "ROC 3", "Chance Line"), 
       lwd = 2, col = c("darkgreen", "gold4", "dodgerblue4",
                        "orange", "red"), bty = "n")
```



For more details, see  @pepe2003statistical.

### Common approaches to estimate ROC curve

* **Empirical:** 
The empirical method simply constructs the ROC curve empirically, applying the definitions of TPR and FPR to the observed
data. Figure 1 is an example of such approach. For every possible cutoff
value c, TPR and FPR are estimated by:

$$
\hat{TPR}(c)=\sum_{i=1}^{n_Y}I(D_{Y_i}\geq c)/n_Y
$$

$$
\hat{FPR}(c)=\sum_{j=1}^{n_{\bar{Y}}}I(D_{{\bar{Y}}_j}\geq c)/n_{\bar{Y}}
$$
where, $Y$ and
$\bar{Y}$ represent the positive and negative responses, $n_Y$
and $n_{\bar{Y}}$
are the total number of positive and negative responses, $D_Y$
and $D_{\bar{Y}}$
are
the distributions of the diagnostic variable in the positive and the negative
responses. The indicator function has the usual meaning. It evaluates 1 if
the expression is true, and 0 otherwise. The area under empirically estimated ROC curve is given by:

$$
\hat{AUC}=\frac{1}{n_Yn_{\bar{Y}}}
\sum_{i=1}^{n_Y}\sum_{j=1}^{n_{\bar{Y}}}
(I(D_{Y_i}>D_{Y_j})+
\frac{1}{2}I(D_{Y_i}>D_{Y_j}))
$$
The variance of AUC can be estimated as [@hanley1982meaning]:
$$
V(AUC)=\frac{1}{n_Yn_{\bar{Y}}}(
AUC(1-AUC) + (n_Y-1)(Q_1-AUC^2) + (n_{\bar{Y}}-1)(Q_2-AUC^2) 
)
$$
where, $Q_1=\frac{AUC}{2-AUC}$, and $Q_2=\frac{2\times AUC^2}{1+AUC}$.

An alternate formula is developed by @delong1988comparing which is given in terms of survivor functions:
 $$
 V(AUC)=\frac{V(S_{D_{\bar{Y}}}(D_Y))}{n_Y}
 +\frac{V(S_{D_Y}(D_{\bar{Y}}))}{n_{\bar{Y}}}
 $$

A confidence band can be computed using the usual approach of normal assumption. For example, a $(1-\alpha)\times 100\%$ confidence band can be constructed
using:

$$
AUC\pm\phi^{-1}(1-\alpha/2)\sqrt{V(AUC)}
$$

The above  formula does not put any restriction on the computed
values of upper and lower bound. However, AUC is a measure bounded
between 0 and 1. One systematic way to do this is the logit transformation [@pepe2003statistical]. Instead of
constructing the interval directly for the AUC, an interval in the logit scale is first constructed using:

$$
L_{AUC}\pm \phi^{-1}(1-\alpha/2)\frac{\sqrt{AUC}}{AUC(1-AUC)}
$$

where $L_{AUC}=log(\frac{AUC}{1-AUC})$ is the logit of AUC. The logit scale intervals can then be inverse logit transformed to find the actual bounds of AUC. 


*Confidence interval of ROC curve:* For large values of $n_Y$ and $n_{\bar{Y}}$, the distribution of $TPR(c)$ at  $FPR(c)$ can be approximated as a normal distribution with following mean and variance:

$$
\mu_{TPR(c)}=\sum_{i=1}^{n_Y}I(D_{Y_i}\geq c)/n_Y
$$


$$
V \Big( TPR(c) \Big)=	\frac{  TPR(c) \Big( 1-  TPR(c)\Big)  }{n_Y}
	+ \bigg( \frac{g(c^*)}{f(c^*) } \bigg)^2\times K
$$
where,
$$
K=\frac{ FPR(c) \Big(1-FPR(c)\Big)}{n_{\bar{Y}} }       
$$

$$
c^*=S^{-1}_{D_{\bar{ Y}}}\Big( FPR(c) \Big)
$$
and, $S$ is the survival function given by, 
$$
S(t)=P\Big(T>t\Big)=\int_t^{\infty}f_T(t)dt=1-F(t)
$$
For details, see @pepe2003statistical.


* **Binormal:** This is a parametric approach where the diagnostic variable in the two groups are assumed to be normal. 

$$
D_Y\sim N(\mu_{D_Y}, \sigma_{D_Y}^2)
$$


$$
D_{\bar{Y}}\sim N(\mu_{D_{\bar{Y}}}, \sigma_{D_{\bar{Y}}}^2)
$$

When such distributional  assumptions are made, ROC curve can be defined as:

$$
y(x)=1-G(F^{-1}(1-x)), \ \ 0\leq x\leq 1
$$
where by $F$ and $G$ are the cumulative density functions of the diagnostic score in the negative
and positive groups respectively, with $f$ and $g$ being corresponding probability density functions. For normal condition, the ROC curve and AUC under curve are given by:

$$
ROC\ Curve: y= \phi(A+BZ_x)
$$

$$
AUC=\phi(\frac{A}{\sqrt{1+B^2}})
$$

where, $Z_x=\phi^{-1}(x(t))=\frac{\mu_{D_{\bar{Y}}}-t}{\sigma_{D_{\bar{Y}}}}$, $t$ being a cutoff; and $A=\frac{|\mu_{D_{{Y}}}-\mu_{D_{\bar{Y}}}|}{\sigma_{D_{{Y}}}}$, $B=\frac{\sigma_{D_{\bar{Y}}}}{\sigma_{D_{{Y}}}}$.


*Confidence interval of ROC curve:* To get the confidence interval, variance of $A+BZ_x$ is derived using:

$$
V(A+B Z_x)=V(A)+Z_x^2V(B)+2Z_xCov(A, B)
$$
A $(1-\alpha)\times100\%$
level confidence limit for $A+Z_xB$ can be obtained as 

$$
(A+Z_xB)\pm \phi^{-1}(1-\alpha/2)\sqrt{V(A+Z_xB)}
$$
Point-wise confidence limit can  be achieved by taking $\phi$ of the above expression.


* **Non-parametric:** 
Non-parametric estimates of $f$ and $g$ are used in this approach. @zou1997smooth presented one such approach using Kernel densities:

$$
	\hat{f}(x)=\frac{1}{n_{\bar{Y}}h_{\bar{ Y}}}\sum_{i=1}^{n_{\bar{ Y}}}
	K\big( \frac{x-D_{\bar{ Y}i} }{h_{\bar{ Y}}}  \big)
$$


$$
\hat{g}(x)=  \frac{1}{n_{{Y}}h_y}\sum_{i=1}^{n_{{ Y}}}
	K\big( \frac{x-D_{{ Y}i} }{h_Y}  \big)
$$

where $K$ is the Kernel function and $h$ smoothing parameter (bandwidth).
@zou1997smooth suggested a biweight
Kernel:


$$
	K\big(\frac{x-\alpha}{\beta}\big)=\begin{cases}
	\frac{15}{16}  \Big[  1-\big(\frac{x-\alpha}{\beta}\big)^2  \Big] 
	, &  x\in  (\alpha - \beta, \alpha + \beta)\\
	0, & \text{otherwise}
	\end{cases}
$$

with the bandwidth given by,
$$
h_{\bar{Y}}=0.9\times min\big(  \sigma_{\bar{ Y}}, \frac{IQR(D_{\bar{ Y}})}{1.34}   \big)/ (n_{\bar{ Y}} )^{\frac{1}{5}}
$$
$$
		h_{{Y}}=0.9\times min\big(  \sigma_{{ Y}}, \frac{IQR(D_{{ Y}})}{1.34}   \big)/ (n_{{ Y}} )^{\frac{1}{5}}
$$

Smoother versions of TPR and FPR are obtained as the right-hand side area (of cutoff) of the smoothed $f$ and $g$. That is,

$$
\hat{TPR}(t)=1-\int_{-\infty}^{t}\hat{g}(t)dt=1-\hat{G}(t)
$$

$$
\hat{FPR}(t)=1-\int_{-\infty}^{t}\hat{f}(t)dt=1-\hat{F}(t)
$$
When discrete pairs of $(FPR, TPR)$ are obtained, trapezoidal rule can be applied to calculate the AUC. 



# Using Package ROCit

## 1/0 coding of response

A binary response can exist as factor, character, or numerics other than 1 and 0. Often it is desired to have the response coded with just 1/0. This makes many calculations easier. 

```{r ch1}
library(ROCit)
data("Loan")

# check the class variable
summary(Loan$Status)
class(Loan$Status)
```




So the response is a factor variable. There are 131 cases of charged off and 769 cases of fully paid. Often the probability of defaulting is modeled in loan data, making the fully paid group as reference.

```{r ch2}
Simple_Y <- convertclass(x = Loan$Status, reference = "FP") 

# charged off rate
mean(Simple_Y)
```

If reference not specified, alphabetically, charged off group is set as reference.

```{r ch3}
mean(convertclass(x = Loan$Status))
```





## Performance metrics of binary classifier

Various performance metrics for binary classifier are available that are cutoff specific. 
Following metrics can be called for via measure argument:

* `ACC`: Overall accuracy of classification.
* `MIS`: Misclassification rate.
* `SENS`: Sensitivity.
* `SPEC`: Specificity.
* `PREC`: Precision. 
* `REC`: Recall. Same as sensitivity.
* `PPV`: Positive predictive value. 
* `NPV`: Positive predictive value.
* `TPR`: True positive rate. 
* `FPR`: False positive rate. 
* `TNR`: True negative rate.
* `FNR`: False negative rate.
* `pDLR`: Positive diagnostic likelihood ratio.
* `nDLR`: Negative diagnostic likelihood ratio.
* `FSCR`: F-score,.




```{r ch4, fig.height=4, fig.width=6,fig.cap="Accuracy vs Cutoff"}
data("Diabetes")
logistic.model <- glm(as.factor(dtest)~chol+age+bmi,
                      data = Diabetes,family = "binomial")
class <- logistic.model$y
score <- logistic.model$fitted.values
# -------------------------------------------------------------
measure <- measureit(score = score, class = class,
                     measure = c("ACC", "SENS", "FSCR"))
names(measure)
plot(measure$ACC~measure$Cutoff, type = "l")
```




## ROC curve estimation

`rocit` is the main function of ROCit package. With the diagnostic score and the class of each observation, it calculates true positive rate (sensitivity) and false positive rate (1-Specificity) at convenient cutoff values to construct ROC curve. The function returns "rocit" object, which can be passed as arguments for other S3 methods.


`Diabetes` data contains information on 403 subjects from 1046 subjects who were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia for African Americans. According to Dr John Hong, Diabetes Mellitus Type II (adult onset diabetes) is associated most strongly with obesity. The waist/hip ratio may be a predictor in diabetes and heart disease. DM II is also associated with hypertension - they may both be part of "Syndrome X". The 403 subjects were the ones who were actually screened for diabetes. Glycosylated hemoglobin > 7.0 is usually taken as a positive diagnosis of diabetes.


In the data, the `dtest` variable indicates whether `glyhb` is greater than 7 or not. 

```{r ch5}
data("Diabetes")
summary(Diabetes$dtest)
summary(as.factor(Diabetes$dtest))
```
The variable is a character variable in the dataset. There are 60 positive and 330 negative instances. There are also 13 instances of NAs.

Now let us use the total cholesterol as a diagnostic measure of having the disease.

```{r ch6}
roc_empirical <- rocit(score = Diabetes$chol, class = Diabetes$dtest,
                       negref = "-") 
```



The negative was taken as the reference group in `rocit` function. No method was specified, by default `empirical` was used.

```{r ch7}
class(roc_empirical)
methods(class="rocit")
```









The `summary` method is available for a `rocit` object. 

```{r ch8}
summary(roc_empirical)
# function returns
names(roc_empirical)
# -------
message("Number of positive responses used: ", roc_empirical$pos_count)
message("Number of negative responses used: ", roc_empirical$neg_count)
```
 
The Cutoffs are in descending order. TPR and FPR are in ascending order. The first cutoff is set to $+\infty$ and the last cutoff is equal to the lowest score in the data that are used for ROC curve estimation. A score greater or equal to the cutoff is treated as positive. 



```{r ch11}
head(cbind(Cutoff=roc_empirical$Cutoff, 
                 TPR=roc_empirical$TPR, 
                 FPR=roc_empirical$FPR))

tail(cbind(Cutoff=roc_empirical$Cutoff, 
                 TPR=roc_empirical$TPR, 
                 FPR=roc_empirical$FPR))
``` 

***Other methods:***
```{r ch12}
roc_binormal <- rocit(score = Diabetes$chol, 
                      class = Diabetes$dtest,
                      negref = "-", 
                      method = "bin") 


roc_nonparametric <- rocit(score = Diabetes$chol, 
                           class = Diabetes$dtest,
                           negref = "-", 
                           method = "non") 

summary(roc_binormal)
summary(roc_nonparametric)
```


***Plotting:***


```{r ch13, fig.width=6,fig.height=4}
# Default plot
plot(roc_empirical, values = F)


# Changing color
plot(roc_binormal, YIndex = F, 
     values = F, col = c(2,4))


# Other options
plot(roc_nonparametric, YIndex = F, 
     values = F, legend = F)
```




***Trying a better model:***

```{r ch14, fig.width=6,fig.height=4}
## first, fit a logistic model
logistic.model <- glm(as.factor(dtest)~
                        chol+age+bmi,
                        data = Diabetes,
                        family = "binomial")

## make the score and class
class <- logistic.model$y
# score = log odds
score <- qlogis(logistic.model$fitted.values)

## rocit object
rocit_emp <- rocit(score = score, 
                   class = class, 
                   method = "emp")
rocit_bin <- rocit(score = score, 
                   class = class, 
                   method = "bin")
rocit_non <- rocit(score = score, 
                   class = class, 
                   method = "non")

summary(rocit_emp)
summary(rocit_bin)
summary(rocit_non)

## Plot ROC curve
plot(rocit_emp, col = c(1,"gray50"), 
     legend = FALSE, YIndex = FALSE)
lines(rocit_bin$TPR~rocit_bin$FPR, 
      col = 2, lwd = 2)
lines(rocit_non$TPR~rocit_non$FPR, 
      col = 4, lwd = 2)
legend("bottomright", col = c(1,2,4),
       c("Empirical ROC", "Binormal ROC",
         "Non-parametric ROC"), lwd = 2)
```


***Confidence interval of AUC:***
```{r ch15}
# Default 
ciAUC(rocit_emp)
ciAUC(rocit_emp, level = 0.9)

# DeLong method
ciAUC(rocit_bin, delong = TRUE)


# logit and inverse logit applied
ciAUC(rocit_bin, delong = TRUE,
      logit = TRUE)


# bootstrap method
set.seed(200)
ciAUC_boot <- ciAUC(rocit_non, 
                level = 0.9, nboot = 200)
print(ciAUC_boot)
```










***Confidence interval of ROC curve:***
```{r ch16, fig.width=6,fig.height=4,fig.cap="Empirical ROC curve with 90% CI"}
data("Loan")
score <- Loan$Score
class <- ifelse(Loan$Status == "CO", 1, 0)
rocit_emp <- rocit(score = score, 
                   class = class, 
                   method = "emp")
rocit_bin <- rocit(score = score, 
                   class = class, 
                   method = "bin")
# --------------------------
ciROC_emp90 <- ciROC(rocit_emp, 
                     level = 0.9)
set.seed(200)
ciROC_bin90 <- ciROC(rocit_bin, 
                     level = 0.9, nboot = 200)
plot(ciROC_emp90, col = 1, 
     legend = FALSE)
lines(ciROC_bin90$TPR~ciROC_bin90$FPR, 
      col = 2, lwd = 2)
lines(ciROC_bin90$LowerTPR~ciROC_bin90$FPR, 
      col = 2, lty = 2)
lines(ciROC_bin90$UpperTPR~ciROC_bin90$FPR, 
      col = 2, lty = 2)
legend("bottomright", c("Empirical ROC",
                        "Binormal ROC",
                        "90% CI (Empirical)", 
                        "90% CI (Binormal)"),
       lty = c(1,1,2,2), col = 
         c(1,2,1,2), lwd = c(2,2,1,1))
```




Options available for plotting ROC curve with CI

```{r ch17}
class(ciROC_emp90)
```






***KS plot:***
KS plot shows the cumulative density functions $F(c)$ and $G(c)$ in the positive and negative populations. 
If the positive population have higher value, then
negative curve ($F(c)$) ramps up quickly. The KS statistic is the maximum
difference of $F(c)$ and $G(c)$.


```{r ch18, fig.height=4, fig.width=6, fig.cap="KS plot"}
data("Diabetes")
logistic.model <- glm(as.factor(dtest)~
                      chol+age+bmi,
                      data = Diabetes,
                      family = "binomial")
class <- logistic.model$y
score <- logistic.model$fitted.values
# ------------
rocit <- rocit(score = score, 
               class = class) #default: empirical
kplot <- ksplot(rocit)
```




```{r ch19}
message("KS Stat (empirical) : ", 
        kplot$`KS stat`)
message("KS Stat (empirical) cutoff : ", 
        kplot$`KS Cutoff`)
```



## Gains table

Gains table is a useful tool used in direct marketing. The observations are first rank ordered and certain number of buckets are created with the observations. The gains table shows  several statistics associated with the buckets. This package includes
`gainstable` function that creates gains table containing ngroup number of groups or buckets. The algorithm first orders the score variable with respect to score variable. In case of tie, it class becomes the ordering variable, keeping the positive responses first. The algorithm calculates the ending index in each bucket as $round((length(score) / ngroup) * (1:ngroup))$. Each bucket should have at least 5 observations.

If buckets' end index are to be ended at desired level of population, then breaks should be specified. If specified, it overrides ngroup and ngroup is ignored. breaks by default always includes 100. If whole number does not exist at specified population, nearest integers are considered. Following stats are computed:

* `Obs`: Number of observation in the group.
* `CObs`: Cumulative number of observations up to the group.
* `Depth`: Cumulative population depth up to the group.
* `Resp`: Number of (positive) responses in the group.
* `CResp`: Cumulative number of (positive) responses up to the group.
* `RespRate`: (Positive) response rate in the group.
* `CRespRate`: Cumulative (positive) response rate up to the group
* `CCapRate`: Cumulative overall capture rate of (positive) responses up to the group.
* `Lift`: Lift index in the group. Calculated as GroupResponseRate / OverallResponseRate.
* `CLift`: Cumulative lift index up to the group.



```{r ch20}
data("Loan")
class <- Loan$Status
score <- Loan$Score
# ----------------------------
gtable15 <- gainstable(score = score, 
                       class = class,
                       negref = "FP", 
                       ngroup = 15)

```


`rocit` object can be passed

```{r ch21}
rocit_emp <- rocit(score = score, 
                   class = class, 
                   negref = "FP")
gtable_custom <- gainstable(rocit_emp, 
                    breaks = seq(1,100,15))
# ------------------------------
print(gtable15)
print(gtable_custom)
```

```{r ch22, fig.height=4, fig.width=6, fig.cap="Lift and Cum. Lift plot"}
plot(gtable15, type = 1)
```



# References