---
title: "Backcalculating Reporting Delays Distributions from Linelist Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Backcalculating Reporting Delays Distributions from Linelist Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This R Markdown document walks through the steps for imputing a reporting delay distribution from linelist data with missing disease onset data, adapted in STAN from [Li and White](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009210)

There are two starting points for this vignette: either you have **caseCount** data, which are aggregated case counts by day, or you have **Line-list data** means you have a single row for each case, that has dates for: infection, symptom onset, positive test, and when this was reported to public health agencies.

### Example: lineList data
```{r setup}
library(WhiteLabRt)
```

**Step 1.** Load data
```{r}
data("sample_report_dates")
data("sample_onset_dates")
```

**Step 2.** Creating `linelist` object
```{r casecounts2}
my_linelist <- create_linelist(report_dates = sample_report_dates,
                                onset_dates = sample_onset_dates)
head(my_linelist)
```

**Step 3.** Define the serial interval.
The `si()` function creates a vector of length 14 with shape and rate for a gamma distribution. Note, this **has** a leading 0 to indicate no infections on the day of disease onset.
```{r serial, fig.width=6.75, dev='png'}
sip <- si(14, 4.29, 1.18)
plot(sip, type = 'l')
```

**Step 5.** Run the back-calculation algorithm.
The default is an R(t) sliding window of 7 days. Additional options to STAN can be specified in the last argument (e.g., chains, cores, control).
```{r backcalc1,eval=F}
out_list_demo <- run_backnow(my_linelist, sip = sip, chains = 1)
```

**Plot outputs**.
The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.
```{r plot1,fig.width=6.75,dev='png'}
data("out_list_demo")
plot(out_list_demo, "est")
```

You can also plot the `R(t)` curve over time. In this case, the red line (and shaded confidence interval) represent the time-varying r(t). See Li and White for description.
```{r plot2, fig.width=6.75,dev='png'}
data("out_list_demo")
plot(out_list_demo, "rt")
```

### Example: Case Count data
You can also do the same from case count data, although at some point you will have to assume a reporting delay distribution, so this would be a little circular.

**Step 1.** Load data
```{r example1}
data("sample_dates")
data("sample_location")
data("sample_cases")

head(sample_dates)
head(sample_cases)
head(sample_location)
```

**Step 2.** Creating case-counts
```{r casecounts}
caseCounts <- create_caseCounts(date_vec = sample_dates,
                                location_vec = sample_location,
                                cases_vec = sample_cases)
head(caseCounts)
```

**Step 3.** Convert to linelist data.
You can specify the distribution for `my_linelist` in  `convert_to_linelist`. `reportF` is the distribution function, `_args` lists the distribution params that are not `x`, and `_missP` is the percent missing. This must be between ${0 < x < 1}$. Both 'caseCounts' and 'caseCounts_line' objects can be fed into `run_backnow`. The implied onset distribution is `rnbinom()` with `size = 3` and `mu = 9`, with `reportF_missP = 0.6` .
```{r}
my_linelist <- convert_to_linelist(caseCounts, 
                                   reportF = rnbinom, 
                                   reportF_args = list(size = 3, mu = 9),
                                   reportF_missP = 0.6)
head(my_linelist)
```

**Step 4.** Define the serial interval.
The `si()` function creates a vector of length 14 with alpha and beta as defined in Li and White, for COVID-19.
```{r serial2,fig.width=6.75, dev='png'}
sip <- si(14, 4.29, 1.18)
```

**Step 5.** Run the back-calculation algorithm.
The defaults are 2000 iterations and an R(t) sliding window of 7 days.
```{r backcalc2, eval=FALSE}
options(mc.cores = 4)

out_list_demo <- run_backnow(my_linelist, sip = sip)
```