---
title: "Ellipse Overlap"
author: "Andrew L Jackson"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Ellipse Overlap}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteDepends{viridis}
  \usepackage[utf8]{inputenc}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", 
                      fig.width = 6, fig.height = 5)


```

## Calculating the Area of Overlap Between Two Ellipses

```{r}
# remove previously loaded items from the current environment and remove previous graphics.
rm(list=ls())
graphics.off()

# Here, I set the seed each time so that the results are comparable. 
# This is useful as it means that anyone that runs your code, *should*
# get the same results as you, although random number generators change 
# from time to time.
set.seed(1)

# load SIBER
library(SIBER)
library(viridis)

# set a new three-colour palette from the viridis package
palette(viridis::viridis(3))

# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)


# Or if working with your own data read in from a *.csv file, you would use
# This *.csv file is included with this package. To find its location
# type
# fname <- system.file("extdata", "demo.siber.data.csv", package = "SIBER")
# in your command window. You could load it directly by using the
# returned path, or perhaps better, you could navigate to this folder
# and copy this file to a folder of your own choice, and create a 
# script from this vingette to analyse it. This *.csv file provides
# a template for how your own files should be formatted.

# mydata <- read.csv(fname, header=T)
# siber.example <- createSiberObject(mydata)


# Create lists of plotting arguments to be passed onwards to the  
# plotting functions. With p.interval = NULL, these are SEA. NB not SEAc though
# which is what we will base our overlap calculations on. This implementation 
# needs to be added in a future update. For now, the best way to plot SEAc is to
# add the ellipses manually following the vignette on this topic.
group.ellipses.args  <- list(n = 100, p.interval = NULL, lty = 1, lwd = 2)



par(mfrow=c(1,1))
plotSiberObject(siber.example,
                  ax.pad = 2, 
                  hulls = F, community.hulls.args, 
                  ellipses = T, group.ellipses.args,
                  group.hulls = F, group.hull.args,
                  bty = "L",
                  iso.order = c(1,2),
                  xlab = expression({delta}^13*C~'permille'),
                  ylab = expression({delta}^15*N~'permille')
                  )

```

In order now to calculate the overlap between two (or more) ellipses, we need to know the coordinates of each ellipse. This is done by calling `addEllipse(..., do.plot = FALSE)`. See the associated help file and the vignette [Customising-Plots-Manually](Customising-Plots-Manually.html) for more information on optional inputs to addEllipse for different types of ellipse. Also, bear in mind that the option `n` controls how many points are used to draw the ellipse, and hence low `n` means clunky, edgier ellipses, compared with rounder, smoother ellipses for higher `n`. A higher `n` is more suitable when ellipses are more eccentric as their curvature is greater at the tips. The new functions `maxLikOverlap` and `bayesianOverlap` are wrapper functions that take care of the calls to `addEllipse` and the actual polygon overlap function in the package `spatstat.utils`. The functions `maxLikOverlap` and `bayesianOverlap` return three values each: the computationally calculated area of the first ellipse, second ellipse, and the overlap between them. It is not entirely obvious to me that there is a single choice if you wish to express your overlap as a proportion, since there are several options for the choice of denominator. One can imagine that expressing the overlap as a proportion of the sum of the non-overlapping areas of the ellipses seems suitable in a general sense, since this will range from 0 when the ellipses are completely distinct, to 1 when the ellipses are completely coincidental.


```{r, MLoverlap}
# In this example, I will calculate the overlap between ellipses for groups 2
# and 3 in community 1 (i.e. the green and yellow open circles of data).

# The first ellipse is referenced using a character string representation where 
# in "x.y", "x" is the community, and "y" is the group within that community.
# So in this example: community 1, group 2
ellipse1 <- "1.2" 

# Ellipse two is similarly defined: community 1, group3
ellipse2 <- "1.3"

# The overlap of the maximum likelihood fitted standard ellipses are 
# estimated using
sea.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, 
                             p.interval = NULL, n = 100)

# the overlap betweeen the corresponding 95% prediction ellipses is given by:
ellipse95.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, 
                             p.interval = 0.95, n = 100)

# so in this case, the overlap as a proportion of the non-overlapping area of 
# the two ellipses, would be
prop.95.over <- ellipse95.overlap[3] / (ellipse95.overlap[2] + 
                                          ellipse95.overlap[1] -
                                          ellipse95.overlap[3])

```

The function `bayesianOverlap` returns multiple rows of these three numbers, each representing the values for a particular draw from the posterior estimates so that you can build up a picture of the distribution of the estimated overlap. Calculating this overlap is computationally time consuming, and there are going to be thousands of posterior samples collected in a typical analysis. For this example, I will calculate the posterior overlap on the first 100 samples, but in reality you would probably want to do this on at least a few hundred, if not all your posterior samples in a longer (perhaps over-lunch or over-night) run.

```{r, bayesOverlap}
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10     # thin the posterior by this many
parms$n.chains <- 2        # run this many chains

# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3

# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the 
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)

# and teh corresponding Bayesian estimates for the overlap between the 
# 95% ellipses is given by:
bayes95.overlap <- bayesianOverlap(ellipse1, ellipse2, ellipses.posterior,
                                   draws = 100, p.interval = 0.95, n = 100)

# a histogram of the overlap
hist(bayes95.overlap[,3], 10)

# and as above, you can express this a proportion of the non-overlapping area of 
# the two ellipses, would be
bayes.prop.95.over <- (bayes95.overlap[,3] / (bayes95.overlap[,2] + 
                                                bayes95.overlap[,1] -
                                                bayes95.overlap[,3])
                       )

hist(bayes.prop.95.over, 10)

```