---
title: "Meta analysis with special GMCMs"
author: "Anders Ellern Bilgrau"
date: "2019-08-31"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Meta analysis with special GMCMs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r knit-int, echo=FALSE, include=FALSE}
library("knitr")
set.seed(5828993)
opts_knit$set(self.contained = TRUE)
```

This is a quick tutorial for using the special GMCM for meta analysis. 

## Initialization
The **GMCM**^[1][1]^ package is loaded.

```{r load-packages, include=TRUE}
#install.packages("GMCM")  # Uncomment to install the GMCM package
library("GMCM")
```


If **GMCM** is *not* installed, please uncomment the above line and rerun the script.

## Loading data
The data is loaded and the first rows are printed. To illustrate we load the
`u133VsExon` dataset within the package. The dataset contains 19,577 *p*-values
for the null hypothesis of no differential gene expression between two cell
types for each of two different experiments called `u133` and `exon`.

```{r load-data, include=TRUE, echo=TRUE}
x <- get(data("u133VsExon"))
head(x, n = 4)
```

See `help("u133VsExon")` for more information.

Next, we subset the data to simply reduce computation time.
After that, we will rank it, and visualize it.

```{r}
x <- x[1:5000, ]
```

The values above are p-values where *small* values indicate strong evidence---contrary to what is expected by the special model. In the special model, *large* values should be critical to the null hypothesis.

Therefore we ranks and scale 1 - p:

```{r preprocess-data}
u <- Uhat(1 - x)
```

The original and ranked data is plotted:

```{r}
par(mfrow = c(1,2))  # Visualizing P-values and the ranked P-values
plot(x, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
     xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(u, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
     xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")
```

Here each point represent a gene. The genes in the lower left of the first panel
and correspondingly in the upper right of the second panel are the seemingly
*reproducible* genes. They have a low *p*-value and thus a high rank in *both*
experiments. The genes in the upper left and lower right are the ones that are
apparently spurious results; they have a low *p*-value in only one experiment.

## Initial parameters
The initial parameters are set to

```{r show-initial-params, include=TRUE, echo=TRUE}
init_par <- c(pie1 = 0.6, mu = 1, sigma = 1, rho = 0.2)
```


## Model fitting
With the data loaded and prepared, the model is can be fitted with the defined initial parameters.

```{r fit_model, error=TRUE}
par <- fit.meta.GMCM(u = u,
                     init.par = init_par,
                     method = "NM",
                     max.ite = 1000,
                     verbose = FALSE)
print(par)
```

We here use the Nelder-Mead fitting method with with a maximum number of iterations of `1000`.

## Meta analysis with unsupervised clustering
The estimated parameters are used to calculate the local and adjusted irreproducibility discovery rates:

```{r compute_probs}
meta_IDR_thres <- 0.05
out <- get.IDR(x, par = par,
               threshold = meta_IDR_thres) # Compute IDR
str(out)

out <- get.IDR(u, par = par, threshold = meta_IDR_thres)
below <- out[["IDR"]] < meta_IDR_thres
out$l <- sum(below)
out$Khat <- ifelse(below, 2, 1)
```

By default, `get.IDR` computes thresholds on the adjusted IDR.
The local irreproducibility discovery rate correspond to the posterior probability of the point originating from the irreproducible component.

## Results
The classes are counted by

```{r classes_table}
table(out$Khat)
```

Where we see how many of the 5000 genes are deemed reproducible and irreproducible.

The results are also displayed by plotting

```{r plot_results}
plot(x, col = out$Khat, asp = 1) # Plot of raw values
plot(u, col = out$Khat, asp = 1) # Plot of copula values
z <- GMCM:::qgmm.marginal(u, theta = meta2full(par, d = ncol(u))) # Estimate latent process
plot(z, col = out$Khat, asp = 1) # Plot of estimated latent process
```

The model indeed capture genes in the upper right and classify them as reproducible.

The fitted `par` object can be converted to a `theta` object which can be plotted directly:

```{r plot_theta}
plot(meta2full(par, d = ncol(u)))
```


### Session information
This report was generated using **rmarkdown**^[2][2]^ and **knitr**^[3][3]^ under the session
given below. The report utilizes [parameterized reports][2] and [`knitr::spin`][3].


```{r session-info}
sessionInfo()
```


### References
Please cite the **GMCM** paper^[1][1]^ if you use the package or shiny app.

```{r citation, echo=FALSE, results='asis'}
cites <- lapply(c("GMCM", "knitr", "rmarkdown"), citation)
fmt_cites <- unlist(lapply(cites, format, style = "text"))
cat(paste0("  ", seq_along(fmt_cites), ". ", fmt_cites, "\n"))
```

[1]: http://doi.org/10.18637/jss.v070.i02
[2]: https://bookdown.org/yihui/rmarkdown/parameterized-reports.html
[3]: https://yihui.name/knitr/demo/stitch/