---
title: "DImodels"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DImodels}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(DImodels)
```

# Getting Started with `DImodels`

The `DImodels` package is designed to make fitting Diversity-Interactions models easier. Diversity-Interactions (DI) models (Kirwan et al 2009) are a set of tools for analysing and interpreting data from experiments that explore the effects of species diversity (from a pool of *S* species) on community-level responses. Data suitable for DI models will include (at least) for each experimental unit: a response recorded at a point in time, and a set of proportions of *S* species  $p_1$, $p_2$, ..., $p_S$ from a point in time prior to the recording of the response. The proportions sum to 1 for each experimental unit.   

__Main changes in the package from version 1.3 to version 1.3.1__

- A `fortify` function method has been added to supplement the data fitted to a linear model with model fit statistics.
- A `describe_model` function is added which can be used to get a short text summary of any DI model.
- Meta-data about a DI model can be accessed via the `attributes` function. 

__Main changes in the package from version 1.2 to version 1.3__

- The `DI` and `autoDI` functions now have an additional parameter called `ID` which enables the user to group the species identity effects (see examples below).
- The `predict` function now has flexibility to calculate confidence and prediction intervals for the predicted values.

__Main changes in the package from version 1.1 to version 1.2__

- There are two new functions added to the package:
     - `predict`: Make predictions from a fitted DI model without having to worry about theta, and the interaction terms in the data.
     - `contrasts_DI`: Create contrasts for a DI model.
     
__Main changes in the package from version 1.0 to version 1.1__

- `DI_data_prepare` is now superseded by `DI_data` (see examples below)

## `DImodels` installation and load

The `DImodels` package is installed from CRAN and loaded in the typical way. 

```{r, eval = FALSE}
install.packages("DImodels")
library("DImodels")
```

## Accessing an introduction to Diversity-Introductions models

It is recommended that users unfamiliar with Diversity-Interactions (DI) models read the introduction to `DImodels`, before using the package. Run the following code to access the documentation. 

```{r, eval = FALSE}
?DImodels
```

## Datasets included in the DImodels package

There are seven example datasets included in the `DImodels` package: `Bell`, `sim1`, `sim2`, `sim3`, `sim4`, `sim5`, `Switzerland`. Details about each of these datasets is available in their associated help files, run this code, for example:  

```{r, eval = FALSE}
?sim3
```
In this vignette, we will describe the `sim3` dataset and show a worked analysis of it.

## The sim3 dataset

The `sim3` dataset was simulated from a functional group (FG) Diversity-Interactions model. There were nine species in the pool, and it was assumed that species 1 to 5 come from functional group 1, species 6 and 7 from functional group 2 and species 8 and 9 from functional group 3, where species in the same functional group are assumed to have similar traits. The following equation was used to simulate the data. 

$$ y = \sum_{i=1}^{9}\beta_ip_i + \omega_{11}\sum_{\substack{i,j = 1 \\ i<j}}^5p_ip_j + \omega_{22}p_6p_7 + \omega_{33}p_8p_9 \\ + \omega_{12}\sum_{\substack{i \in {1,2,3,4,5} \\ j \in {6,7}}}p_ip_j + \omega_{13}\sum_{\substack{i \in {1,2,3,4,5} \\ j \in {8,9}}}p_ip_j + \omega_{23}\sum_{\substack{i \in {6,7} \\ j \in {8,9}}}p_ip_j + \gamma_k + \epsilon$$
Where $\gamma_k$ is a treatment effect with two levels (*k = 1,2*) and $\epsilon$ was assumed IID N(0, $\sigma^2$). The parameter values are in the following table. 

| Parameter   | Value       | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Parameter   | Value |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| $\beta_1$   | 10      |                 | $\omega_{11}$    | 2     |
| $\beta_2$   | 9       |                 | $\omega_{22}$    | 3     |
| $\beta_3$   | 8       |                 | $\omega_{33}$    | 1       |
| $\beta_4$   | 7       |                 | $\omega_{12}$   | 4     |
| $\beta_5$   | 11      |                 | $\omega_{13}$   | 9     |
| $\beta_6$   | 6       |                 | $\omega_{23}$   | 3     |
| $\beta_7$   | 5       |                 | $\gamma_1$  | 3     |
| $\beta_8$   | 8       |                 | $\gamma_2$   | 0     |
| $\beta_9$   | 9       |                 | $\sigma$     | 1.2      |


Here, the non-linear parameter $\theta$ that can be included as a power on each $p_ip_j$ component of each interaction variable (Connolly et al 2013) was set equal to one and thus does not appear in the equation above.

The 206 rows of proportions contained in the dataset `design_a` (supplied in the package) were used to simulate the `sim3` dataset. Here is the first few rows from `design_a`:

```{r, echo = FALSE, results='asis'}
library(DImodels)
data("design_a")
knitr::kable(head(design_a))
```

Where `community` is an identifier for unique sets of proportions and `richness` is the number of species in the community. 

The proportions in `design_a` were replicated over two treatment levels, giving a total of 412 rows in the simulated dataset. The `sim3` data can be loaded and viewed in the usual way. 

```{r, echo = TRUE, results='asis'}
data("sim3")
knitr::kable(head(sim3, 10))
```

## Exploring the data 

There are several graphical displays that will help to explore the data and it may also be useful to generate summary statistics.  

```{r, echo = TRUE}
hist(sim3$response, xlab = "Response", main = "")
# Similar graphs can also be generated for the other species proportions.
plot(sim3$p1, sim3$response, xlab = "Proportion of species 1", ylab = "Response")
summary(sim3$response)
```

## Implementing an automated DI model fitting process using `autoDI`

The function `autoDI` in `DImodels` provides a way to do an automated exploratory analysis to compare a range of DI models. It works through a set of automated steps (Steps 1 to 4) and will select the 'best' model from the range of models that have been explored and test for lack of fit in that model. The selection process is not exhaustive, but provides a useful starting point in analysis using DI models.

```{r, echo = TRUE}
auto1 <- autoDI(y = "response", prop = 4:12, treat = "treatment", 
                FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"), data = sim3, 
                selection = "Ftest")
```

The output of `autoDI`, works through the following process:

1. Step 1 fitted the average interactions (`AV`) model and uses profile likelihood to estimate the non-linear parameter $\theta$ and tests whether or not it differs from one. $\theta$ was estimated to be 0.96814 and was not significantly different from one ($p = 0.4572$). Therefore, subsequent steps assumed $\theta=1$ when fitting the DI models.
2. Step 2 fitted five different DI models, each with a different form of species interactions and treatment was always included. The functional group model (FG) was the selected model. This assumes that pairs of species interact according to functional group membership. 
3. Step 3 provided a test for the treatment and indicated that the treatment, included as an additive factor, was significant and needed in the model ($p < 0.0001$).
4. Step 4 provides a lack of fit test, here there was no indication of lack of fit in the model selected in Step 3 ($p = 0.6423$).

Further details on each of these steps are available in the `autoDI` help file. Run the following code to access the documentation. 

```{r, eval = FALSE}
?autoDI
```


All parameter estimates from the selected model can be viewed using `summary`.

```{r, echo = TRUE}
summary(auto1)
```

If the final model selected by autoDI includes a value of theta other than 1, then a 95% confidence interval for $\theta$ can be generated using the `theta_CI` function: 
```{r, eval = FALSE, echo = TRUE}
theta_CI(auto1, conf = .95)
```
Here, this code would not run, since the final model selected by `autoDI` does not include theta estimated. 

## Fitting individual models using the `DI` function

For some users, the selection process in `autoDI` will be sufficient, however, most users will fit additional models using `DI`. For example, while the treatment is included in `autoDI` as an additive factor, interactions between treatment and other model terms are not considered. Here, we will first fit the model selected by `autoDI` using `DI` and then illustrate the capabilities of `DI` to fit specialised models. 

### Fitting the final model selected by `autoDI` using `DI`

```{r, echo = TRUE}
m1 <- DI(y = "response", prop = 4:12, 
         FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"), treat = "treatment", 
         DImodel = "FG", data = sim3)
summary(m1)
```

### Re-fitting the final model selected by `autoDI` estimating theta using `update_DI`

```{r, echo = TRUE}
m1_theta <- update_DI(object = m1, estimate_theta = TRUE)
coef(m1_theta)
```

### Grouping the species identity effects in the model
The species identity effects in a DI model can be grouped by specifying groups for each species using the `ID` argument.
The `ID` argument functions similar to the `FG` argument and accepts a character list of same length as number of species in the model. The identity effects of species belonging in the same group will be grouped together.

Grouping all identity effects into a single term
```{r, echo = TRUE}
m1_group <- update_DI(object = m1_theta, 
                      ID = c("ID1", "ID1", "ID1", "ID1", "ID1",
                             "ID1", "ID1", "ID1", "ID1"))
coef(m1_group)
```

Grouping identity effects of specific species
```{r, echo = TRUE}
m1_group2 <- update_DI(object = m1_theta, 
                       ID = c("ID1", "ID1", "ID1", 
                              "ID2", "ID2", "ID2", 
                              "ID3", "ID3", "ID3"))
coef(m1_group2)
```

Note: Grouping ID effects will not have an effect on the calculation of the interaction effects, they would still be calculated by using all species.

Read the documentation of `DI` and `autoDI` for more information and examples using the `ID` parameter.
```{r, eval = FALSE}
?DI
?autoDI
```


## Fitting customised models using the `DI` function

There are two ways to fit customised models using `DI`; the first is by using the option `DImodel = ` in the `DI` function and adding the argument `extra_formula = ` to it, and the second is to use the `custom_formula` argument in the `DI` function. If species interaction variables (e.g., the FG interactions or the average pairwise interaction) are included in either `extra_formula` or `custom_formula`, they must first be created and included in the dataset. The function `DI_data` can be used to compute several types of species interaction variables. 

### Including treatment by species identity term interactions using `extra_formula`

```{r, echo = TRUE}
m2 <- DI(y = "response", prop = 4:12, 
         FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"), treat = "treatment", 
         DImodel = "FG", extra_formula = ~ (p1 + p2 + p3 + p4):treatment,
         data = sim3)
summary(m2)
```

### Including treatment by species interaction terms using `extra_formula`

First, we create the FG pairwise interactions, using the `DI_data` function with the `what` argument set to `"FG"`.

```{r, echo = TRUE}
FG_matrix <- DI_data(prop = 4:12, FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"), 
                     data = sim3, what = "FG")
sim3a <- data.frame(sim3, FG_matrix)
```
 
Then we fit the model using `extra_formula`.

```{r, echo = TRUE}
m3 <- DI(y = "response", prop = 4:12, 
         FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"),
         treat = "treatment", DImodel = "FG", 
         extra_formula = ~ (bfg_FG1_FG2 + bfg_FG1_FG3 + bfg_FG2_FG3 +
                              wfg_FG1 + wfg_FG2 + wfg_FG3) : treatment, data = sim3a)
summary(m3)
```
 
### Fitting only a subset of the FG interaction terms using `custom_formula`

First, we create a dummy variable for level A of the treatment (this is required for the `glm` engine that is used within `DI` and because there is no intercept in the model). 

```{r, echo = TRUE}
sim3a$treatmentA <- as.numeric(sim3a$treatment == "A")
```
  
Then we fit the model using `custom_formula`.

```{r, echo = TRUE}
m3 <- DI(y = "response",
         custom_formula = response ~ 0 + p1 + p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9 +
           treatmentA + bfg_FG1_FG2 + bfg_FG1_FG3 + bfg_FG2_FG3, data = sim3a)
summary(m3)
```

## Making predictions and testing contrasts for DI models
### Predictions using a DI model

We can make predictions from a DI model just like any other regression model using the `predict` function. The user does not need to worry about adding any interaction terms or adjusting any columns if theta is not equal to 1. Only the species proportions along with any additional experimental structures is needed and all other terms in the model will be calculated for the user.  
```{r, echo = TRUE}
# Fit model
m3 <- DI(y = "response", prop = 4:12, 
         treat = "treatment", DImodel = "AV", 
         extra_formula = ~ (AV) : treatment, data = sim3a)

predict_data <- sim3[c(1, 79, 352), 3:12]
# Only species proportions and treatment is needed
print(predict_data)
# Make prediction
predict(m3, newdata = predict_data)
```

### Uncertainity around predictions
```{r}
# The interval and level parameters can be used to calculate the 
# uncertainty around the predictions

# Get confidence interval around prediction
predict(m3, newdata = predict_data, interval = "confidence")

# Get prediction interval around prediction
predict(m3, newdata = predict_data, interval = "prediction")

# The function returns a 95% interval by default, 
# this can be changed using the level argument
predict(m3, newdata = predict_data, 
        interval = "prediction", level = 0.9)
```

### Contrasts for DI models
The `contrasts_DI` function can be used to compare and formally test for a difference in performance of communities within the same as well as across different experimental structures

Comparing the performance of the monocultures of different species at treatment A
```{r, echo = TRUE}
contr <- list("p1vsp2" = c(1, -1, 0, 0,  0,  0, 0, 0,  0, 0, 0, 0),
              "p3vsp5" = c(0,  0, 1, 0, -1,  0, 0, 0,  0, 0, 0, 0),
              "p4vsp6" = c(0,  0, 0, 1,  0, -1, 0, 0,  0, 0, 0, 0),
              "p7vsp9" = c(0,  0, 0, 0,  0,  0, 1, 0, -1, 0, 0, 0))
the_C <- contrasts_DI(m3, contrast = contr)
summary(the_C)
```

Comparing across the two treatment levels for monoculture of species 1
```{r, echo = TRUE}
contr <- list("treatAvsB" = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0))
the_C <- contrasts_DI(m3, contrast = contr)
summary(the_C)
```

Comparing between two species mixtures
```{r, echo = TRUE}
mixA <- c(0.25, 0,      0.25, 0,      0.25, 0,      0.25, 0, 0, 0, 0, 0)
mixB <- c(0,    0.3333, 0,    0.3333, 0,    0.3333, 0,    0, 0, 0, 0, 0)

# We have the proportions of the individual species in the mixtures, however
# we still need to calculate the interaction effect for these communities
contr_data <- data.frame(rbind(mixA, mixB))
colnames(contr_data) <- names(coef(m3))

# Adding the interaction effect of the two mixtures
contr_data$AV <- DI_data_E_AV(prop = 1:9, data = contr_data)$AV
print(contr_data)

# We can now subtract the respective values in each column of the two 
# mixtures and get our contrast
my_contrast <- as.matrix(contr_data[1, ] - contr_data[2, ])
rownames(my_contrast) <- "mixAvsB"

the_C <- contrasts_DI(m3, contrast = my_contrast)
summary(the_C)
```


## References
Connolly J, T Bell, T Bolger, C Brophy, T Carnus, JA Finn, L Kirwan, F Isbell, J Levine, A LÃ¼scher, V Picasso, C Roscher, MT Sebastia, M Suter and A Weigelt (2013) An improved model to predict the effects of changing biodiversity levels on ecosystem function. Journal of Ecology, 101, 344-355. 

Kirwan L, J Connolly, JA Finn, C Brophy, A LÃ¼scher, D Nyfeler and MT Sebastia (2009) Diversity-interaction modelling - estimating contributions of species identities and interactions to ecosystem function. Ecology, 90, 2032-2038.