--- title: "The recommended workflow of DImulti() analyses" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The recommended workflow of DImulti() analyses} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{css styling, echo=FALSE} span.R { font-family: Courier New; } ``` ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(crayon.enabled = TRUE) ansi_aware_handler <- function(x, options) { paste0( "<pre class=\"r-output\"><code>", fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"), "</code></pre>" ) } old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks, which = c("output", "message", "error", "warning")) knitr::knit_hooks$set( output = ansi_aware_handler, message = ansi_aware_handler, warning = ansi_aware_handler, error = ansi_aware_handler ) ``` ```{r setup, message=FALSE, warning=FALSE} library(DImodelsMulti) library(dplyr) library(nlme) library(ggplot2) ``` This vignette aims to give a rough outline for the process to be followed when examining multivariate and/or repeated measures BEF relationship study data using the <span class="R">DImulti()</span> function from the <span class="R">DImodelsMulti</span> R package. We will use the dataset <span class="R">simMVRM</span>, which is included in the package, to illustrate the workflow of the package. <h2>Examining the data</h2> We always start by looking at our data using <span class="R">View()</span> or <span class="R">head()</span>, to make sure it is as expected and includes all required information. ```{r data_head} head(simMVRM) ``` We can then look at some summarising metrics, such as the mean of each ecosystem function, separated by time as below, which can help us know what to expect in the analysis, in this case we see that ecosystem functioning improves at time point 2 for the average multifunctionality index, <span class="R">MFindex</span>, but the function <span class"R">Y2</span> performs better at time 1. ```{r data_group} simMVRM_group <- dplyr::summarise(dplyr::group_by(simMVRM, time), Y1 = mean(Y1), Y2 = mean(Y2), Y3 = mean(Y3), MFindex = mean(Y1 + Y2 + Y3)) simMVRM_group ``` We can also produce plots to the same effect. ```{r data_hist, out.width="75%"} hist(simMVRM[which(simMVRM$time == 1), ]$Y1, main = "Y1 at time 1", xlab = "Y1") ``` <h2>Fitting the first model</h2> We use the main function of the <span class="R">DImodelsMulti</span> R package, <span class="R">DImulti()</span>, to fit a repeated measures Diversity-Interactions model to the data. It is recommended to begin with the simplest model options and increase complexity as model selection/comparisons permit. The simplest model available to be fit by the package is the structural model, <span class="R">"STR"</span>, which only includes an intercept for each time and ecosystem function. $$ y_{kmt} = \beta_{0kt} + \epsilon_{kmt} $$ ```{r DImulti_modelSTR} modelSTR <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "STR", method = "ML") ``` As our dataset is multivariate and in a wide format, we specify our response columns through the parameter <span class="R">y</span>, "NA" through the first index of the vector <span class="R">eco_func</span>, and select the unstructured (<span class="R">"UN"</span>) autocorrelation structure for our ecosystem functions through the second index of <span class="R">eco_func</span>. The data also contains multiple time points, so we pass the column holding the time point indicator through the parameter <span class="R">time</span>, along with the chosen autocorrelation structure, in this case we use compound symmetry, <span class="R">"CS"</span>. To facilitate grouping the recorded responses, a column index for the unique identifier of the experimental units is passed through <span class="R">unit_IDs</span>. We choose to use the maximum likelihood (<span class="R">"ML"</span>) estimation method as we intend to compare models with differing fixed effects. We can use <span class="R">print()</span> or <span class="R">summary()</span> to view information on the model in our console. ```{r DImulti_modelSTR_print} print(modelSTR) ``` From this evaluation, we can see that each of our coefficients are significant at an alpha significance level of 0.05 ($\alpha = 0.05$). We can also see the variance covariance matrices for our repeated measures and multiple ecosystem functions, along with the combined matrix, from which we can see the estimated strength and direction of the covarying relationships between response types. <h2>Fitting a Diversity-Interactions model</h2> The next model to be fit is the simplest Diversity-Interactions (DI) model available in the package, the identity (<span class="R">"ID"</span>) structure. $$ y_{kmt} = \sum^{S}_{i=1}{\beta_{ikt} p_{im}} + \epsilon_{kmt} $$ The DI modelling framework assumes that the main driver behind changes in ecosystem functioning is the initial relative abundance/proportion of the species.To reflect this, the intercept from the structural model we fit previously is replaced by the initial proportion of the species in the study, each with their own $\beta$ coefficient. These fixed effects form a simplex space. The only code that we need to change for this model is the <span class="R">DImodel</span> tag, from <span class="R">"STR"</span> to <span class="R">"ID"</span>. ```{r DImulti_modelID} modelID <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ID", method = "ML") ``` We can now view this model, as before. In this case, we simply extract the t-table from the model summary. ```{r DImulti_modelID_tTable} summary(modelID)$tTable ``` The models within the DI modelling framework can be easily compared as they are hierarchical in nature, i.e. they are nested within one another. We can compare nested models using a likelihood ratio test, through <span class="R">anova()</span>, or information criteria, such as <span class="R">AIC()</span> or <span class="R">BIC()</span> (the correct versions of which, <span class="R">AICc()</span> or <span class="R">BICc()</span> should be used in the cased of models with a large number of terms). {width=100%} Image Credit: Kirwan *et al.* 2009 We choose to compare these models using a likelihood ratio test, where the null hypothesis states that the likelihoods of the two models do not significantly differ from one another, i.e. the extra terms in our larger model are not worth the added complexity. ```{r DImulti_STR_ID_LRT} anova(modelSTR, modelID) ``` As the p-value of the test is less than our chosen $\alpha$ value of 0.05, we reject the null hypothesis and proceed with the more complex model. We can confirm this selection by choosing the model with the lower AIC and BIC values, which are also printed by the <span class="R">anova()</span> call. <h2>Fitting and comparing interactions structures</h2> We continue to increase the complexity of the models by fitting the next interaction structure in the nesting series, the average interaction model, <span class="R">"AV"</span>, a reparameterisation of the evenness model, <span class="R">"E"</span>. Again, the only code change required is the change of the <span class="R">DImodel</span> tag. We then compare the two models. ```{r DImulti_modelAV} modelAV <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML") coef(modelAV) anova(modelID, modelAV) ``` Once again, as our p-value < 0.05, we select the more complex model and continue increasing complexity. <h3>Estimating theta</h3> Now that we are fitting interactions, we may elect to estimate the non-linear parameter $\theta$, see vignette [onTheta](DImulti_onTheta.html) for a more in depth look at this process. We include the parameter <span class="R">estimate_theta</span> with argument <span class="R">TRUE</span> to have <span class="R">DImulti()</span> automatically fit and compare different values of $\theta$, or if a user has *a priori* information on the nature of the term, they may pass values through the parameter <span class="R">theta</span>. We can then test the two models against one another using information criteria, to see if the theta values significantly differ from 1. ```{r DImulti_modelAV_theta} modelAV_theta <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML", estimate_theta = TRUE) thetaVals <- modelAV_theta$theta thetaVals AICc(modelAV) AICc(modelAV_theta) ``` In this case, the inclusion of theta does improve the model. The next models in the series are not nested within one another, so either can be used next but they cannot be compared to one another using a likelihood ratio test. As this dataset was simulated without any functional groupings, i.e. any groupings would be arbitrary, we choose to fit the additive interaction structure, <span class="R">"ADD"</span>. ```{r DImulti_modelADD} modelADD <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals) modelADD$coefficients anova(modelAV_theta, modelADD) ``` In this case, the p-value is greater than our chosen alpha level of 0.05, therefore we fail to reject the null hypothesis that the added complexity of the model does not significantly increase the model likelihood, and proceed in our analysis with the simpler model, <span class="R">modelAV</span>. As we do not need to increase the complexity of the model through the interaction structures any further, we turn to other options. It is at this stage that one could introduce treatment effects or environmental variables. As the <span class="R">simMVRM</span> dataset does not contain any such information, example code is included below, where model <span class="R">modelAV_treat1</span> includes an intercept for <span class="R">treat</span> for each level of ecosystem function and time point, while <span class="R">modelAV_treat2</span> crosses <span class="R">treat</span> with each other fixed effect term. ```{r DImulti_modelADD_treat, eval=FALSE} modelAV_treat1 <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals, extra_fixed = ~treat) modelAV_treat2 <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals, extra_fixed = ~1:treat) ``` The model could also alternatively be simplified here, by adding ID grouping, i.e. introducing functional redundancy within species ID effects. As <span class="R">simMVRM</span> is simulated with no ID grouping in mind, any groups selected would be arbitrary and thus not recommended, however, example code for doing so is included below. These groupings do not affect the interaction term(s). ```{r DImulti_modelAV_ID} modelAV_ID <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML", theta = thetaVals, ID = c("Group1", "Group1", "Group2", "Group2")) summary(modelAV_ID)$tTable anova(modelAV_ID, modelAV) ``` As the p-value is lower than 0.05, we reject the null hypothesis and continue with the more complex model, without any ID groupings. If any errors are encountered in the process of fitting these models, please see vignette [commonErrors](DImulti_commonErrors.html) for a guide. <h2>Final model</h2> As we have concluded our model selection process, the final step before interpretation is to refit the chosen model design using the method <span class="R">"REML"</span>, for unbiased estimates. ```{r DImulti_modelFinal} modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML", theta = thetaVals) summary(modelFinal) ``` <h2>Interpretation</h2> The final and most important step in the workflow of any analysis is the interpretation of the final model. We first look at the fixed effect coefficients, their significance and direction. For example, we can see that the identity effect of species <span class="R">p1</span> has a significant, negative effect on ecosystem function <span class="R">Y1</span> at time point <span class="R">1</span>, but a significant positive effect on ecosystem function <span class="R">Y3</span> at the same time point. We can also see that the interaction effect <span class="R">AV</span> has a significant positive effect on all ecosystem functions at both time points. As this interaction term is maxmimised in an ecosystem design where all four species are represented evenly, it may be beneficial to include species <span class="R">p1</span> despite its negative effect on <span class="R">Y1</span>, the presence of other species, which have a positive effect on this ecosystem function could also 'make up' for this fault of species <span class="R">p1</span> (assuming that our goal is the maximisation of all ecosystem functions). ```{r DImulti_modelFinal_tTable} summary(modelFinal)$tTable ``` Next we could examine the variance covariance matrices of the model. These can be extracted from the model by either using the function <span class="R">getVarCov()</span> from the R package <span class="R">nlme</span> or by using the <span class="R">$</span> operator (the components of a model in R can be viewed using <span class="R">View(model)</span>). ```{r DImulti_modelFinal_varCovs, eval=FALSE} nlme::getVarCov(modelFinal) modelFinal$vcov ``` ```{r DImulti_modelFinal_getVarCov, echo=FALSE} nlme::getVarCov(modelFinal) ``` An example of an interpretation that we can make from these variance covariance matrices, is that the ecosystem functions <span class="R">Y1</span> and <span class="R">Y3</span> are negatively correlated, therefore would likely result in trade-offs when trying to estimated one over the other. As multivariate repeated measures DI models can contain a large number of fixed effects, they can be difficult to interpret from. Our suggestions to circumvent this issue is to first predict from the model for community designs of interest, using the predict() function, as seen below, see vignette [prediction](DImulti_prediction.html) for further details on this. ```{r DImulti_modelFinal_predict} comms <- simMVRM[c(1, 4, 7, 10, 21), ] print(comms) commPred <- predict(modelFinal, newdata = comms) commPred ``` Then, with these predictions, we can employ summarising metrics, such as a multifunctionality index, to summarise our findings while maintaining the benefits of having modelled each ecosystem function separately but simultaneously, alla Suter *et al* 2021. We could also use these predictions to generate a plot/graph to display our findings, such as a histogram, grouped bar chart, or ternary diagram. ```{r DImulti_modelFinal_grouped, out.width="75%"} ggplot2::ggplot(commPred, ggplot2::aes(fill=Ytype, y=Yvalue, x=plot)) + ggplot2::geom_bar(position="dodge", stat="identity") ``` A new R package, <span class="R">DImodelsVis</span>, is in development, which will make plotting DI models fit using the R packages <span class="R">DImodels</span> or <span class="R">DImodelsMulti</span> a simple process. <h2>References</h2> Suter, M., Huguenin-Elie, O. and Lüscher, A., 2021. Multispecies for multifunctions: combining four complementary species enhances multifunctionality of sown grassland. Scientific Reports, 11(1), p.3835. Kirwan, L., Connolly, J., Finn, J.A., Brophy, C., Lüscher, A., Nyfeler, D. and Sebastià , M.T., 2009. Diversity-interaction modeling: estimating contributions of species identities and interactions to ecosystem function. Ecology, 90(8), pp.2032-2038.