---
title: "On prediction from multivariate repeated measures DI models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{On prediction from multivariate repeated measures DI models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{css styling, echo=FALSE}
span.R {
  font-family: Courier New;
}
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(crayon.enabled = TRUE)

ansi_aware_handler <- function(x, options)
{
  paste0(
    "<pre class=\"r-output\"><code>",
    fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"),
    "</code></pre>"
  )
}

old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks,
                                 which = c("output", "message", "error", "warning"))
knitr::knit_hooks$set(
  output = ansi_aware_handler,
  message = ansi_aware_handler,
  warning = ansi_aware_handler,
  error = ansi_aware_handler
)
```

```{r setup}
library(DImodelsMulti)
```

For this vignette, we will use the final model achieved in the vignette 
[workflow](DImulti_workflow.html) as an example.
```{r DImulti_modelEx}
modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"),
                    unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML")
print(modelFinal)
```

<h2>Prediction how-to overview </h2>

To predict for any data from this model, which has custom class <span class="R">DImulti</span>, we 
use the <span class="R">predict()</span> function, which is formatted as below, where object is the
<span class="R">DImulti</span> model object, <span class="R">newdata</span> is a dataframe or tibble
containing the community designs that you wish to predict from, if left <span class="R">NULL</span>
then the data used to train the model will be predicted from instead, and 
<span class="R">stacked</span> is a boolean which determines whether the output from this function
will be given in a stacked/long format (<span class="R">TRUE</span>) or wide format 
(<span class="R">FALSE</span>).
```{r predict_layout, eval=FALSE}
predict.DImulti(object, newdata = NULL, stacked = TRUE, ...)
```


The first option for prediction is to simply provide the model object to the function to predict 
from the dataframe we used to train it (<span class="R">simMVRM</span>). By default, the prediction 
dataframe is output in a stacked format, as it is more commonly used for plotting than a wide 
output.
```{r predict_default}
head(predict(modelFinal))
```

If we would rather a wide output, which can be easier to infer from without plotting, we can set
<span class="R">stacked = FALSE</span>.
```{r predict_wide}
head(predict(modelFinal, stacked = FALSE))
```

We can also provide some subset of the original dataset rather than using it all.
```{r predict_subset}
predict(modelFinal, newdata = simMVRM[c(1, 4, 7, 10, 21), ])
```

Or we can use a dataset which follows the same format as <span class="R">simMVRM</span> but is 
entirely new data. If no information is supplied for which ecosystem functions or time points from 
which you wish to predict, then all will be included automatically.
```{r predict_newSim}
newSim <- data.frame(plot = c(1, 2),
                     p1 = c(0.25, 0.6),
                     p2 = c(0.25, 0.2),
                     p3 = c(0.25, 0.1),
                     p4 = c(0.25, 0.1)) 

predict(modelFinal, newdata = newSim)
```

Otherwise, only the ecosystem functions/time points specified will be predicted from. As our dataset
is in a wide format, we will need to supply some arbitrary value to our desired ecosystem function 
column.
```{r predict_Y1}
newSim <- data.frame(plot = c(1, 2),
                     p1 = c(0.25, 0.6),
                     p2 = c(0.25, 0.2),
                     p3 = c(0.25, 0.1),
                     p4 = c(0.25, 0.1),
                     Y1 = 0) 

predict(modelFinal, newdata = newSim)
```

In the case that some information is missing from this new data, the function will try to set a 
value for the column and will inform the user through a warning printed to the console.
```{r predict_newSim_missingID}
newSim <- data.frame(p1 = c(0.25, 0.6),
                     p2 = c(0.25, 0.2),
                     p3 = c(0.25, 0.1),
                     p4 = c(0.25, 0.1)) 

predict(modelFinal, newdata = newSim)
```

<h2>Caution</h2>

<h3>Merging predictions</h3>

You may wish to merge your predictions to your <span class="R">newdata</span> dataframe for 
plotting, printing, or further analysis.
As the function <span class="R">DImulti()</span>, and as a consequence, the function 
<span class="R">predict.DImulti()</span>, sorts the data it is provided, to ensure proper labelling,
you may not be able to directly use <span class="R">cbind()</span> to append the predictions to
your dataset. 
In this case, ensure the unit_IDs column contains unique identifiers for your data rows and that 
you specify <span class="R">stacked</span> to correctly match your data layout.
Then use the function <span class="R">merge()</span>.
```{r predict_newSim_merge}
newSim <- data.frame(plot = c(1, 2),
                     p1 = c(0.25, 0.6),
                     p2 = c(0.25, 0.2),
                     p3 = c(0.25, 0.1),
                     p4 = c(0.25, 0.1)) 

preds <- predict(modelFinal, newdata = newSim, stacked = FALSE)

merge(newSim, preds, by = "plot")
```


<h3>Non-unique unit_IDs</h3>

In the case that your <span class="R">newdata</span> contains non-unique unit_IDs values and 
<span class="R">stacked = FALSE</span>, any rows with common unit_IDs will be aggregated using the
<span class="R">mean()</span> function.
```{r predict_newSim_aggregate}
newSim <- data.frame(plot = c(1, 1),
                     p1 = c(0.25, 0.6),
                     p2 = c(0.25, 0.2),
                     p3 = c(0.25, 0.1),
                     p4 = c(0.25, 0.1)) 

predict(modelFinal, newdata = newSim, stacked = FALSE)
```