---
title: "C. ASCA"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{C. ASCA}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=6,
fig.height=4
)
# Legge denne i YAML på toppen for å skrive ut til tex
#output:
# pdf_document:
# keep_tex: true
# Original:
# rmarkdown::html_vignette:
# toc: true
```
```{r setup}
# Start the HDANOVA R package
library(HDANOVA)
```
# Analysis of Variance Simultaneous Component Analysis (ASCA)
The ANOVA part of ASCA includes all the possible variations of ANOVA
demonstrated in the ANOVA section and more. Also, generalized linear
models (GLM) can be used. The following theory will be exemplified:
* Basic ASCA
* Permutation testing
* Random effects
* Scores and loadings
* Data and confidence ellipsoids
* Combined effects
* Numeric effects
* ANOVA-PCA (APCA)
* PC-ANOVA
* MSCA
* LiMM-PCA
## Basic ASCA
The ANOVA part of ASCA includes all the possible variations of ANOVA demonstrated in the
ANOVA vignette and more. Also generalized linear models (GLM) can be used. We start
by demonstrating ASCA with a fixed effect model of two factors with interactions
and ordinary PCA on the effect matrices.
```{r}
# Load Candy data
data(candies)
# Fit ASCA model
mod <- asca(assessment ~ candy*assessor, data=candies)
summary(mod)
```
The summary shows that the candy effect is the largest by far.
### Permutation testing
To get more insight, we can perform permutation testing of the factors. Here we use approximate permutation.
```{r}
# Permutation testing (default = 1000 permutations, if not specified)
mod <- asca(assessment ~ candy*assessor, data=candies, permute=TRUE)
summary(mod)
```
Here we see that all effects are significant, where the Candy effect is the dominating one. (The P-values are rounded from 0.001 to 0 in the print). This can also be visualised by looking at the sums-of-squares values obtained under permutation compared to the original value.
```{r}
permutationplot(mod, factor = "assessor")
```
### Random effects
One can argue that the assessors are random effects, thus should be handled
as such in the model. We can do this by adding r() around the assessor term.
See also LiMM-PCA below for the REML estimation version.
```{r}
# Fit ASCA model with random assessor
mod.mixed <- asca(assessment ~ candy*r(assessor), data=candies, permute=TRUE)
summary(mod.mixed)
```
### Scores and loadings
The effects can be visualised through, e.g., loading and score plots to assess
the relations between variables, objects and factors. If a factor is not
specified, the first factor is plotted.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
loadingplot(mod, scatter=TRUE, labels="names", main="Candy loadings")
scoreplot(mod, main="Candy scores")
par(par.old)
```
A specific factor can be plotted by specifying the factor name or number.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
loadingplot(mod, factor="assessor", scatter=TRUE, labels="names", main="Assessor loadings")
scoreplot(mod, factor="assessor", main="Assessor scores")
par(par.old)
```
Score plots can be modified, e.g., omitting backprojections or adding spider plots.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
scoreplot(mod, factor="assessor", main="Assessor scores", projections=FALSE)
scoreplot(mod, factor="assessor", main="Assessor scores", spider=TRUE)
par(par.old)
```
And scores and loadings can be extracted for further analysis.
```{r}
L <- loadings(mod, factor="candy")
head(L)
S <- scores(mod, factor="candy")
head(S)
```
### Data ellipsoids and confidence ellipsoids
To emphasize factor levels or assess factor level differences, we can add
data ellipsoids or confidence ellipsoids to the score plot. The confidence
ellipsoids are built on the assumption of balanced data, and their theories are
built around crossed designs.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
scoreplot(mod, ellipsoids="data", factor="candy", main="Data ellipsoids")
scoreplot(mod, ellipsoids="confidence", factor="candy", main="Confidence ellipsoids")
par(par.old)
```
If we repeat this for the mixed model, we see that both types of ellipsoids
change together with the change in denominator term in the underlying ANOVA model.
It should be noted that the theory for confidence ellipsoids in mixed models
is not fully developed, so interpretation should be done with caution.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
scoreplot(mod.mixed, ellipsoids="data", factor="candy", main="Data ellipsoids")
scoreplot(mod.mixed, ellipsoids="confidence", factor="candy", main="Confidence ellipsoids")
par(par.old)
```
### Combined effects
In some cases, it can be of interest to combine effects in ASCA. Here, we use
an example with the Caldana data where we combine the _light_ effect with the
_time:light_ interaction using the _comb()_ function.
```{r}
# Load Caldana data
data(caldana)
# Combined effects
mod.comb <- asca(compounds ~ time + comb(light + light:time), data=caldana)
summary(mod.comb)
```
\noindent When combined effects have a time factor, the scores can be plotted against time.
```{r}
# Scores plotted as a function of time
par.old <- par(mfrow=c(2,1), mar = c(4,4,1,1))
timeplot(mod.comb, factor="light", time="time", comb=2, comp=1, x_time=TRUE)
timeplot(mod.comb, factor="light", time="time", comb=2, comp=2, x_time=TRUE)
par(par.old)
```
### Quantitative effects
Quantitative effects, so-called covariates, can also be included in a model, though their use
in ASCA are limited to ANOVA estimation and explained variance, not being used in
subsequent PCA or permutation testing. We demonstrate this using the Caldana data
again, but now recode the time effect to a quantitative effect, meaning it will be
handled as a linear continuous effect.
```{r}
caldanaNum <- caldana
caldanaNum$time <- as.numeric(as.character(caldanaNum$time))
mod.num <- asca(compounds ~ time*light, data = caldanaNum)
summary(mod.num)
```
## ANOVA-PCA (APCA)
APCA differs from ASCA by adding the error term to the model before
performing PCA instead of backprojecting errors afterwards.
```{r}
# Fit APCA model
modp <- apca(assessment ~ candy*assessor, data=candies)
summary(modp)
```
Plot scores and loadings.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
loadingplot(modp, scatter=TRUE, labels="names", main="Candy loadings")
scoreplot(modp, main="Candy scores")
par(par.old)
```
## PC-ANOVA
In PC-ANOVA, a PCA is first applied to the data before the scores are subjected
to ANOVA, effectively reversing the roles in ASCA. This means there will
be one or more ANOVA tables in the summary of the output. In this example,
we have chosen to use the number of components that explain at least 90% of the
variation of the data.
```{r}
mod.pc <- pcanova(assessment ~ candy * assessor, data = candies, ncomp = 0.9)
print(mod.pc)
summary(mod.pc)
```
When creating score and loading plots for PC-ANOVA, the 'global' scores and loadings
will be shown, but the factors can still be used for manipulating the symbols.
```{r, fig.width=4.5, fig.height=7}
par.old <- par(mfrow=c(2,1), mar=c(4,4,2,1))
loadingplot(mod.pc, scatter=TRUE, labels="names", main="Global loadings")
scoreplot(mod.pc, main="Global scores")
par(par.old)
```
## MSCA
Multilevel Simultaneous Component Analysis (MSCA) is a version of ASCA that
assumes a single factor to be used as a between-individuals factor, while the
the within-individuals is assumed implicitly.
```{r}
# Default MSCA model with a single factor
mod.msca <- msca(assessment ~ candy, data=candies)
summary(mod.msca)
```
Scoreplots can be created for the between-individuals factor and the within-individuals factor,
and for each level of the within-individuals factor.
```{r}
# Scoreplot for between-individuals factor
scoreplot(mod.msca)
# Scoreplot of within-individuals factor
scoreplot(mod.msca, factor="within")
# .. per factor level
par.old <- par(mfrow=c(3,2), mar=c(4,4,2,1), mgp=c(2,0.7,0))
for(i in 1:length(mod.msca$scores.within))
scoreplot(mod.msca, factor="within", within_level=i,
main=paste0("Level: ", names(mod.msca$scores.within)[i]),
panel.first=abline(v=0,h=0,col="gray",lty=2))
par(par.old)
```
## LiMM-PCA
A version of LiMM-PCA is also implemented in HDANOVA. It combines REML-estimated
mixed models with PCA and scales the back-projected errors according to degrees of freedom
or effective dimensions (user choice).
```{r}
# Default LiMM-PCA model with two factors and interaction, 8 PCA components
mod.reml <- limmpca(assessment ~ candy*r(assessor), data=candies, pca.in=8)
summary(mod.reml)
scoreplot(mod.reml, factor="candy")
```
One can also use least squares estimation without REML. This affects the random
effects and scaling of backprojections.
```{r}
# LiMM-PCA with least squares estimation and 8 PCA components
mod.ls <- limmpca(assessment ~ candy*r(assessor), data=candies, REML=NULL, pca.in=8)
summary(mod.ls)
scoreplot(mod.ls)
```
## Repeated measures
We revisit the simulated data from the ANOVA vignette to demonstrate ASCA with
repeated measures. The data are subset, a time effect is added, and the response
is extended to the multivariate case.
```{r}
set.seed(123)
# Original simulation
dat <- data.frame(
feed = factor(rep(rep(c("low","high"), each=6), 4)),
breed = factor(rep(c("NRF","Hereford","Angus"), 16)),
bull = factor(rep(LETTERS[1:4], each = 12)),
daughter = factor(rep(letters[1:4], 12)),
age = round(rnorm(48, mean = 36, sd = 5))
)
dat$yield <- 150*with(dat, 10 + 3 * as.numeric(feed) + as.numeric(breed) +
2 * as.numeric(bull) + 1 * as.numeric(sample(dat$daughter, 48)) +
0.5 * age + rnorm(48, sd = 2))
# Extended to repeated measures
long <- dat[c(1:4,9:12), c("feed", "daughter", "yield")]
long <- rbind(long, long, long)
long$time <- factor(rep(1:3, each=8))
long$yield <- long$yield + rnorm(24, sd = 100) + rep(c(-200,0,200), each=8)
# Made multiresponse (no added structure, only noise)
long$yield <- I(matrix(rep(long$yield,10),nrow=length(long$yield),ncol=10)) + rnorm(length(long$yield)*10)
```
Analysing the data with ASCA using the least squares approach.
```{r}
# Least squares mixed model ASCA
mod.rm.asca <- asca(yield ~ r(daughter) + feed*r(time), data = long)
summary(mod.rm.asca)
```
Corresponding analysis using the LiMM-PCA approach with REML estimation.
```{r}
# REML mixed model LiMM-PCA
mod.rm.limmpca <- limmpca(yield ~ r(daughter) + feed*r(time), data = long, pca.in=2)
summary(mod.rm.limmpca)
```