---
title: "Linear Regression"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Linear Regression}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dplyr)
library(tidypredict)
library(parsnip)
```

## Highlights & Limitations

- **Supports prediction intervals**, it uses the `qr.solve()` function to parse the interval coefficient of each term.
- Supports categorical variables and interactions
- Only *treatment* contrast (`contr.treatment`) are supported.
- `offset` is supported
- Categorical variables are supported
- In-line functions in the formulas are **not supported**:  
     - OK - `wt ~ mpg + am` 
     - OK - `mutate(mtcars, newam = paste0(am))` and then `wt ~ mpg + newam`
     - Not OK - `wt ~ mpg + as.factor(am)`
     - Not OK - `wt ~ mpg + as.character(am)`

## How it works

```{r}
library(dplyr)
library(tidypredict)

df <- mtcars %>%
  mutate(char_cyl = paste0("cyl", cyl)) %>%
  select(mpg, wt, char_cyl, am)

model <- lm(mpg ~ wt + char_cyl, offset = am, data = df)
```

It returns a SQL query that contains the coefficients (`model$coefficients`) operated against the correct variable or categorical variable value.  In most cases the resulting SQL is one short `CASE WHEN` statement per coefficient.  It appends the `offset` field or value, if one is provided.
```{r}
library(tidypredict)
tidypredict_sql(model, dbplyr::simulate_mssql())
```

Alternatively, use `tidypredict_to_column()` if the results are the be used or previewed in `dplyr`.

```{r}
df %>%
  tidypredict_to_column(model) %>%
  head(10)
```

## Prediction intervals

Use `tidypredict_sql_interval()` to get the SQL query that operates the prediction interval.  The `interval` defaults to 0.95
```{r}
tidypredict_sql_interval(model, dbplyr::simulate_mssql())
```

Prediction intervals also works in the `tidypredict_to_column()`, just set the `add_interval` argument to `TRUE`.
```{r}
df %>%
  tidypredict_to_column(model, add_interval = TRUE) %>%
  head(10)
```

## Under the hood

The parser reads several parts of the `lm` object to tabulate all of the needed variables.  One entry per coefficient is added to the final table, those entries will have the results of `qr.solve()` already operated and placed in the correct column, they will have a `qr_` prefix.  There will be one `qr_` column per coefficient.  

Other variables are added at the end. Some variables are not required for every parsed model.  For example, `offset` is listed because it's part of the formula (call) of the model, if there were no offset in a given model, that line would not exist.

```{r}
pm <- parse_model(model)
str(pm, 2)
```

The output from `parse_model()` is transformed into a `dplyr`, a.k.a Tidy Eval, formula.  All categorical variables are operated using `if_else()`.
```{r}
tidypredict_fit(model)
```

A function to put together the Tidy Eval interval formula is also supported
```{r}
tidypredict_interval(model)
```

From there, the Tidy Eval formula can be used anywhere where it can be operated. `tidypredict` provides three paths:

  - Use directly inside `dplyr`,  `mutate(df, !! tidypredict_fit(model))`
  - Use `tidypredict_to_column(model)` to a piped command set
  - Use `tidypredict_to_sql(model)` to retrieve the SQL statement

The same applies to the prediction interval functions.

## How it performs

Testing the `tidypredict` results is easy.  The `tidypredict_test()` function automatically uses the `lm` model object's data frame, to compare `tidypredict_fit()`, and `tidypredict_interval()` to the results given by `predict()`

```{r}
tidypredict_test(model)
```

To run with prediction intervals set the `include_intervals` argument to `TRUE`

```{r}
tidypredict_test(model, include_intervals = TRUE)
```

## parsnip

`tidypredict` also supports `lm()` model objects fitted via the `parsnip` package.

```{r}
library(parsnip)

parsnip_model <- linear_reg() %>%
  set_engine("lm") %>%
  fit(mpg ~ wt + cyl, offset = am, data = mtcars)

tidypredict_fit(parsnip_model)
```