---
title: "dlmwwbe: Dynamic Linear Model for Wastewater-based epidemiology with missing values"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{my-vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  comment = "#>",
  fig.width = 7.2,
  fig.height = 4.8,
  fig.align = "center"
)
```

This package **dlmwwbe** (Dynamic Linear Model for Wastewater-based Epidemilogy with Missing Data) contains two main function **pdlm()** (Predictive Dynamic Linear Model) and **dllm()** (Dynamic Local Level Model). The first one is to fit a dynamic linear model for forecasting the clinical positive cases (or other similar data) using lagged clinical and wastewater data. The second one is to fit a local level model for smoothing the noisy wastewater data. For more details, see **papers** here.

```{r setup}
knitr::opts_chunk$set(echo = TRUE)
library(dlmwwbe)
data(wastewater)
data(wastewaterhealthworker)
```

## Dynamic Local Level Model

First, we implement **dllm()** on the wastewater data collected between 2022 - 2024 in Twin Cities metro area in Minnesota, United States. For the detail of the data, see **papers**. There are two possible structures: 1. all wastewater data share a single latent state (*S = 'univariate'*). 2. Each wastewater data has its own latent sate (*S = 'kvariate'*). For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument *log10 = TRUE*. This is because the data better approximates the normality assumption in practice. Other transformation might be necessary depending on the nature of the data. The **summary()** provides some information of the fitted model.

Consider both wastewater data have their individual latent state. The average of the smoother is provided.

```{r}
data_TC <- wastewater[wastewater$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- dllm(
  equal.state.var=FALSE,
  equal.obs.var=FALSE,
  log10=TRUE,
  data = data_TC,
  date = "SampleDate",
  obs_cols = c("ORFlab", "Nlab"),
  S = c('kvariate')
)

summary(fit)
plot(fit, type='smoother', conf.int = TRUE)
```

## Predictive Dynamic Linear Model

Next, we implement **pdlm()** on the clinical and wastewater data. Different number of lags are demonstrated. For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument *log10 = TRUE* (and add $1$ for the positive count cases for a valid transformation). The **summary()** provides some information of the fitted model.

Here, We consider $0$ and $2$ lags and plot them along with the observed data on its original scale. 

```{r}
data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=0,
  log10=TRUE,
  date = NULL,
  equal.state.var = TRUE,
  equal.obs.var = FALSE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)
plot(fit, conf.int=TRUE)

```
```{r}
data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=2,
  log10=TRUE,
  date = NULL,
  equal.state.var = FALSE,
  equal.obs.var = TRUE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)
plot(fit, conf.int=TRUE)

```