---
title: "divent"
subtitle: "Measures of Diversity based on Entropy"
bibliography: ../inst/REFERENCES.bib
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to divent}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r global_options, include=FALSE}
set.seed(97310)
```

_divent_ is a package for _R_ designed to estimate diversity based on HCDT entropy or similarity-based entropy. 
It is a reboot of the _entropart_ package, following the tidyverse manifest and easier to use.
This is a short introduction to its use.

The package allows estimating biodiversity according to the framework based on HCDT entropy, the correction of its estimation-bias [@Grassberger1988; @Chao2003; @Chao2015] and its transformation into equivalent numbers of species [@Hill1973; @Jost2006; @Marcon2014a].
Estimation of diversity at arbitrary levels of sampling, requiring interpolation or extrapolation [@Chao2014] is available

Phylogenetic or functional diversity [@Marcon2014b] can be estimated, considering phyloentropy as the average species-neutral diversity over slices of a phylogenetic or functional tree [@Pavoine2009]. 

Similarity-based diversity [@Leinster2012] can be used to estimate [@Marcon2014e] functional diversity from a similarity or dissimilarity matrix between species without requiring building a dendrogram and thus preserving the topology of species  [@Pavoine2005a; @Podani2007].

The classical diversity estimators (Shannon and Simpson entropy) can be found in many R packages. 
Bias correction is rarely available except in the _EntropyEstimation_ [@Cao2014] package which provides the Zhang and Grabchak's estimators of entropy and diversity and their asymptotic variance (not included in _divent_).


# Estimating the diversity of a community

## Community data

Community data is:

- either a numeric vector containing abundances of species (the number of individual of each species) or their probabilities (the proportion of individuals of each species, summing to 1).
This format is convenient when a single community is considered.
- or a dataframe whose lines are communities and column are species. 
Its values are either abundances or probabilities. 
Special columns contain the site names, and their weights (e.g. their area or number of individuals).

Example data is provided in the dataset `paracou_6_abd`.
Let's get the abundances of tree species in the 6.25-ha tropical forest plot #6 from Paracou forest station in French Guiana.
It is divided into 4 equally-sized subplots:

```{r load_paracou6}
library("divent")
paracou_6_abd
# Number of individuals in each community
abd_sum(paracou_6_abd)
```


The data in `paracou_6_abd` is an object of class `abundances`, i.e. a tibble with species as columns and sites as rows.
It can be manipulated as any dataframe and plotted as a rank-abundance curve:

```{r plot_paracou6}
autoplot(paracou_6_abd[1, ])
```

The `rcommunity` function allows drawing random communities, e.g. a log-normal one [@Preston1948]:

```{r rcommunity}
rc <- rcommunity(1, size = 10000, distribution = "lnorm")
autoplot(rc, fit_rac = TRUE, distribution = "lnorm")
```

The Whittaker plot (rank-abundance curve) of a random log-normal distribution of 10000 individuals simulated with default parameter ($\sigma = 1$) is produced.


## Diversity estimation

The classical indices of diversity are richness (the number of species), Shannon's and Simpson's entropies:

```{r estimation}
div_richness(paracou_6_abd)
ent_shannon(paracou_6_abd)
ent_simpson(paracou_6_abd)
```

When applied to probabilities (created with `as_probaVector` in the following example), no estimation-bias correction is applied: this means that indices are just calculated by applying their definition function to the probabilities (that is the naive, or plugin estimator).

```{r naive_shannon}
library("dplyr")
paracou_6_abd %>% 
  as_probabilities() %>% 
  ent_shannon()
```

When abundances are available, many estimators can be used [@Marcon2015a] to address unobserved species and the non-linearity of the indices:

```{r shannon_estimators}
ent_shannon(paracou_6_abd)
ent_shannon(paracou_6_abd, estimator = "ChaoJost")
```

The best available estimator is chosen by default: its name is returned.

Those indices are special cases of the Tsallis entropy [-@Tsallis1988] or order $q$ (respectively $q=0,1,2$ for richness, Shannon, Simpson):

```{r ent_tsallis}
ent_tsallis(paracou_6_abd, q = 1)
```

Entropy should be converted to its effective number of species, i.e. the number of species with equal probabilities that would yield the observed entropy, called @Hill1973 numbers or simply diversity [@Jost2006].

```{r div_hill}
div_hill(paracou_6_abd, q = 1)
```

Diversity is the deformed exponential of order $q$ of entropy, and entropy is the deformed logarithm of of order $q$ of diversity:

```{r lnq}
(d2 <- div_hill(paracou_6_abd, q = 2)$diversity)
ln_q(d2, q = 2)
(e2 <-ent_tsallis(paracou_6_abd, q = 2)$entropy)
exp_q(e2, q = 2)
```

If an ultrametric dendrogram describing species phylogeny (here, a mere taxonomy with family, genus and species) is available, phylogenetic entropy and diversity [@Marcon2014b] can be calculated:

```{r PhyloDiversity}
div_phylo(paracou_6_abd, tree = paracou_6_taxo, q = 1)
```

Recall that all those functions can be applied to a numeric vector containing abundances, without having to build an object of class `abundances`.

```{r DivVector}
# Richness of a community of 100 species, each of them with 10 individuals
div_richness(rep(10, 100))
```

With a Euclidian distance matrix between species, similarity-based diversity [@Leinster2012; @Marcon2014e] is available:

```{r SBDiversity}
# Similarity is computed from the functional distance matrix of Paracou species
Z <- fun_similarity(paracou_6_fundist)
# Calculate diversity of order 2
div_similarity(paracou_6_abd, similarities = Z, q = 2)
```

## Diversity profiles

Diversity can be plotted against its order to provide a diversity profile. 
Order 0 corresponds to richness, 1 to Shannon's and 2 to Simpson's diversities:

```{r div_profile}
profile_hill(paracou_6_abd) %>% autoplot
```

Profiles of phylogenetic diversity and similarity-based diversity are obtained the same way.

```{r PDiversityProfile}
profile_phylo(paracou_6_abd, tree = paracou_6_taxo) %>% autoplot
# Similarity matrix
Z <- fun_similarity(paracou_6_fundist)
profile_similarity(paracou_6_abd, similarities = Z) %>% autoplot
```

## Diversity accumulation

Diversity can be interpolated or extrapolated to arbitrary sampling levels.

```{r div_level}
# Estimate the diversity of 1000 individuals
div_hill(paracou_6_abd, q = 1, level = 1000)
```

The sampling level can be a sample coverage, that is converted to the equivalent number of individuals.

```{r div_coverage}
# Estimate the diversity at 80% coverage
div_hill(paracou_6_abd, q = 1, level = 0.8)
```

Diversity accumulation curves are available.

```{r accum_hill}
accum_hill(
  paracou_6_abd[1, ], 
  q = 1, 
  levels = 1:500,
  n_simulations = 100
) %>% 
  autoplot()
```

Phylogenetic diversity can be addressed the same way.
Confidence intervals of the estimation can be computed, taking into account sampling variability.

```{r accum_div_phylo}
accum_div_phylo(
  paracou_6_abd[1, ],
  tree = paracou_6_taxo,
  q = 1, 
  levels = 1:2000
) %>% 
  autoplot()
```


# Estimating the diversity of a meta-community

## Meta-community data

A metacommunity is the assemblage several communities.

The set of communities is described by the abundances of their species and their weight.

Species probabilities in the meta-community are by definition the weighted average of their probabilities in the communities.
Abundances are calculated so that the total abundance of the metacommunity is the sum of all abundances of communities.
If weights are equal, then the abundances of the metacommunity are simply the sum of those of the communities.
If they are not, the abundances of the metacommunity are generally not integer values, which complicates the estimation of diversity.

Example:

```{r MetaCommunitydf}
# Abundances of three communities with four species
(abd <- matrix(
  c(
    10,  0, 25, 10, 
    20, 15, 10, 35, 
     0, 10,  5,  2
  ),
  ncol = 4
))
# Community weights
w <- c(1, 2, 1)
```

A set of communities is built.

```{r}
(communities <- as_abundances(abd, weights = w))
```


The function `metacommunity()` creates a metacommunity.
To plot it, use argument `type = "Metacommunity` when plotting the `species_distribution`.

```{r MetaCommunityMC}
(mc <- metacommunity(communities))
plot(communities, type = "Metacommunity")
```

Each shade of grey represents a species.
Heights correspond to the probability of species and the width of each community is its weight.


## Diversity estimation

High level functions allow computing diversity of all communities ($\alpha$ diversity), of the meta-community ($\gamma$ diversity), and $\beta$ diversity, i.e.\ the number of effective communities (the number of communities with equal weights and no common species that would yield the observed $\beta$ diversity).

The `div_part` function calculates everything at once, for a given order of diversity $q$:

```{r DivPart}
div_part(paracou_6_abd, q = 1)
```

An alternative is the `gamma` argument of all diversity estimation function to obtain $\gamma$ diversity instead of the diversity of each community.

```{r gamma}
div_hill(paracou_6_abd, q = 1, gamma = TRUE)
```

# Full documentation

https://ericmarcon.github.io/divent/ 

# References