---
title: "Explain"
author: "Roland Krasser"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Explain}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The explore package offers a simplified way to use machine learning to understand and explain patterns in the data.

* `explain_tree()` creates a decision tree. The target can be binary, categorical or numerical
* `explain_forest()` creates a random forest. The target can be binary, categorical or numerical
* `explain_xgboost()` creates a random forest. The target must be binary (0/1, FALSE/TRUE)
* `explain_logreg()` creates a logistic regression. The target must be binary
* `balance_target()` to balance a target
* `weight_target()` to create weights for the decision tree


We use synthetic data in this example

```{r message=FALSE, warning=FALSE}
library(dplyr)
library(explore)

data <- create_data_buy(obs = 1000)
glimpse(data)
```
### Explain / Model
#### Decision Tree

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>% explain_tree(target = buy)
```

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>% explain_tree(target = mobiledata_prd)
```

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>% explain_tree(target = age)
```

#### Random Forest

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>% explain_forest(target = buy, ntree = 100)
```

To get the model itself as output you can use the parameter `out = "model` or `out = all` to get all (feature importance as plot and table, trained model). To use the model for a prediction, you can use `predict_target()`

#### XGBoost

As XGBoost only accepts numeric variables, we use `drop_var_not_numeric()` to drop `mobile_data_prd` as it is not a numeric variable. An alternative would be to convert the non numeric variables into numeric.

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>%
  drop_var_not_numeric() |> 
  explain_xgboost(target = buy)
```

Use parameter `out = "all"` to get more details about the training

```{r message=FALSE, warning=FALSE}
train <- data %>%
  drop_var_not_numeric() |> 
  explain_xgboost(target = buy, out = "all")
```

```{r message=FALSE, warning=FALSE}
train$importance
```

```{r message=FALSE, warning=FALSE}
train$tune_plot
```

```{r message=FALSE, warning=FALSE}
train$tune_data
```

To use the model for a prediction, you can use `predict_target()`

#### Logistic Regression

```{r message=FALSE, warning=FALSE}
data %>% explain_logreg(target = buy)
```

### Balance Target

If you have a data set with a very unbalanced target (in this case only 5% of all observations have `buy == 1`) it may be difficult to create a decision tree.

```{r message=FALSE, warning=FALSE}
data <- create_data_buy(obs = 2000, target1_prob = 0.05)
data %>% describe(buy)
```

It may help to balance the target before growing the decision tree (or use weighs as alternative). In this example we down sample the data so buy has 10% of `target == 1`.

```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4}
data %>%
  balance_target(target = buy, min_prop = 0.10) %>%
  explain_tree(target = buy)
```