---
title: "Getting started with {superspreading}"
output: rmarkdown::html_vignette
bibliography: references.json
link-citations: true
vignette: >
  %\VignetteIndexEntry{Getting started with {superspreading}}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The {superspreading} R package provides a set of functions for understanding
individual-level transmission dynamics, and thus whether there is evidence of
superspreading or superspreading events (SSE). Individual-level transmission is
important for understanding the growth or decline in cases of an infectious 
disease accounting for transmission heterogeneity, this heterogeneity is not 
accounted for by the population-level reproduction number ($R$).

::: {.alert .alert-info}
## Definition

Superspreading describes individual heterogeneity in disease transmission, such that some individuals transmit to many infectees while other
infectors infect few or zero individuals [@lloyd-smithSuperspreadingEffectIndividual2005].
:::

```{r setup}
library(superspreading)
library(epiparameter)
```

::: {.alert .alert-secondary}
The [{epiparameter}](https://github.com/epiverse-trace/epiparameter) R package is loaded to provide a library of epidemiological parameters. These
parameters can be used by the {superspreading} functions. See the [Empirical superspreading] section of this
vignette for its usage.

As an example, offspring distributions are stored in the {epiparameter} library which contain estimated parameters,
such as the reproduction number ($R$), and in the case of a negative binomial model, the dispersion parameter ($k$).

The offspring distribution is the distribution of the number of infectees (secondary case or offspring) that each infector (primary case) produces. 
:::

## Probability of epidemic

The probability that a novel disease will cause a epidemic (i.e. sustained 
transmission in the population) is determined by the nature of that diseases'
transmission heterogeneity. This variability may be an intrinsic property of 
the disease, or a product of human behaviour and social mixing patterns.

For a given value of $R$, if the variability is high, the probability that the outbreak 
will cause epidemic is lower as the superspreading events are rare. Whereas for
lower variability the probability is higher as more individuals are closer to the 
mean ($R$).

::: {.alert .alert-secondary}
Here we use $R$ to represent the reproduction number (number of secondary cases caused by a 
typical case). Depending on the situation, this may be equivalent to the basic reproduction
number ($R_0$, representing transmission in a fully susceptible population) or the effective 
reproduction number at a given point in time ($R_t$, representing the extent of transmission 
at time $t$). Either can be input into the functions provided by {superspreading}
:::

The `probability_epidemic()` function in {superspreading} can calculate
this probability. $k$ is the dispersion parameter of a negative binomial distribution
and controls the variability of individual-level transmission.

```{r, prob-epidemic-k}
probability_epidemic(R = 1.5, k = 1, num_init_infect = 1)
probability_epidemic(R = 1.5, k = 0.5, num_init_infect = 1)
probability_epidemic(R = 1.5, k = 0.1, num_init_infect = 1)
```

In the above code, $k$ values above one represent low heterogeneity (in the case 
$k \rightarrow \infty$ it is a poisson distribution), and as $k$ decreases,
heterogeneity increases. When $k$ equals 1, the distribution is geometric. Values
of $k$ less than one indicate overdispersion of disease transmission, a signature
of superspreading.

When the value of $R$ increases, this causes the probability of an 
epidemic to increase, if $k$ remains the same.

```{r, prob-epidemic-R}
probability_epidemic(R = 0.5, k = 1, num_init_infect = 1)
probability_epidemic(R = 1.0, k = 1, num_init_infect = 1)
probability_epidemic(R = 1.5, k = 1, num_init_infect = 1)
probability_epidemic(R = 5, k = 1, num_init_infect = 1)
```

Any value of $R$ less than or equal to one will have zero probability of causing
a sustained epidemic.

Finally, the probability that a new infection will cause a large epidemic is influenced by the
number of initial infections seeding the outbreak. We define
`a` to be this number of initial infections.

```{r, prob-epidemic-a}
probability_epidemic(R = 1.5, k = 1, num_init_infect = 1)
probability_epidemic(R = 1.5, k = 1, num_init_infect = 10)
probability_epidemic(R = 1.5, k = 1, num_init_infect = 100)
```

## Empirical superspreading

Given `probability_epidemic()` it is possible to determine the probability of 
an epidemic for diseases for which parameters of an offspring distribution have 
been estimated. An offspring distribution is simply the distribution of the 
number of secondary infections caused by a primary infection. It is the distribution
of $R$, with the mean of the distribution given as $R$.

Here we can use [{epiparameter}](https://github.com/epiverse-trace/epiparameter) to load in offspring distributions for multiple 
diseases and evaluate how likely they are to cause epidemics.

```{r, epiparam}
sars <- epiparameter_db(
  disease = "SARS",
  epi_name = "offspring distribution",
  single_epiparameter = TRUE
)
evd <- epiparameter_db(
  disease = "Ebola Virus Disease",
  epi_name = "offspring distribution",
  single_epiparameter = TRUE
)
```

The parameters of each distribution can be extracted:

```{r, params}
sars_params <- get_parameters(sars)
sars_params
evd_params <- get_parameters(evd)
evd_params
```

```{r, prob-epidemic-empiric}
family(sars)
probability_epidemic(
  R = sars_params[["mean"]],
  k = sars_params[["dispersion"]],
  num_init_infect = 1
)
family(evd)
probability_epidemic(
  R = evd_params[["mean"]],
  k = evd_params[["dispersion"]],
  num_init_infect = 1
)
```

In the above example we assume the initial pool of infectors is one (`num_init_infect = 1`) 
but this can easily be adjusted in the case there is evidence for a larger initial
seeding of infections, whether from animal-to-human spillover or imported cases from outside the area of interest.

We can see that the probability of an epidemic given the estimates of 
@lloyd-smithSuperspreadingEffectIndividual2005 is greater for Ebola than SARS.
This is due to the offspring distribution of Ebola having a larger dispersion 
(dispersion $k$ = `r evd_params[["dispersion"]]`), compared to SARS, which has a relatively
small dispersion ($k$ = `r sars_params[["dispersion"]]`).

## References