---
title: "lmhelprs"
author: "Shu Fai Cheung"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{lmhelprs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


# Introduction

At the time of writing,
[`lmhelprs`](https://sfcheung.github.io/lmhelprs/)
has two sets of functions:

- `hierarchical_lm()` for ordering
  linear regression models in a hierarchical
  way and use `anova()` to compare them.

- `test_highest()` to identify the
  highest order term in a linear
  regression model and compare the model
  to a model with this term removed.

# Hierarchical Regression Analysis

Users can use `anova()` manually to do
hierarchical regression. `hierarchical_lm()`
is written with these features:

- Check whether the models are really
  "hierarchical". If yes, order them
  automatically. Users do not need to
  put them in the correct order
  manually.

- Compute R-squared and R-squared
  change and add them to the output.

Let's fit three models for illustration:

```{r}
library(lmhelprs)
data(data_test1)
lm1a <- lm(y ~ x1 + x2, data_test1)
lm1b <- lm(y ~ x1 + x2 + x3 + x4, data_test1)
lm1c <- lm(y ~ x1 + x2 + x3 + x4 + cat2, data_test1)
```

They are can be passed to
`hierarchical_lm()` in any order.
Hierarchical regression analysis
will be conducted correctly, with
R-squared and R-squared change:

```{r}
hierarchical_lm(lm1b, lm1a, lm1c)
```

If the models are not in hierarchical
order, an error will be raised:

```{r error = TRUE}
lm2a <- lm(y ~ x1 + x2, data_test1)
lm2b <- lm(y ~ x1 + x3 + x4, data_test1)
hierarchical_lm(lm2a, lm2b)
```

Please refer to the help page of
`hierarchical_lm()` on how it works,
and the print method of its output
(`print.hierarchical_lm()`) on how
to customize the printout.

# Test The Highest Order Term

In a linear regression model with
a term of second or higher order,
the function `test_highest()` can be
used to identify the highest order
term, compare the model to a model
with this term removed, and estimate
and test the R-squared increase due
to adding this term.

For example:

```{r}
lm_mod <- lm(y ~ x1 + cat2 + cat1 + cat2:cat1, data_test1)
summary(lm_mod)
```

Although it has six product terms,
in terms of variables, it only has one
higher order term: `cat2:cat1`, the
interaction between `cat2` and `cat1`.

To automatically find this term, and
compare this model to a model without
this interaction, `test_highest()`
can be used:

```{r}
test_highest(lm_mod)
```

```{r echo = FALSE}
mod_test <- test_highest(lm_mod)
```

The R-squared change due to adding
this interaction (represented by
six product terms) to a model without
interaction is `r formatC(mod_test[2, "R.sq.change"], digits = 5, format = "f")`,
and the *p*-value of this change is `r formatC(mod_test[2, "Pr(>F)"], digits = 3, format = "f")`.

The function `test_highest()` can be
used for third and higher order term.
For example:

```{r}
lm_mod3 <- lm(y ~ x1 + x2 + x3*x4*cat2, data_test1)
summary(lm_mod3)
test_highest(lm_mod3)
```

The model with three-way interaction is
compared to a model with only two-way
interactions (`x3:x4`, `x3:cat2`, and
`x4:cat2`).

If a model has more than one term of
the highest order, an error will be
raised:

```{r error = TRUE}
lm_mod2 <- lm(y ~ x1 + x2*x3 + x2*x4, data_test1)
test_highest(lm_mod2)
```

Please refer to the help page of
`test_highest()` on how it works,

# Issues

If you have any suggestions and found any bugs, please feel
feel to open a GitHub issue. Thanks.