---
title: "ATQ Guide"
output: rmarkdown::html_vignette
author: Vinay Joshy
vignette: >
%\VignetteIndexEntry{ATQ_Guide}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 90
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE, message = FALSE
)
```
# ATQ: Using Absenteeism Data to Detect Onset of Epidemics
## Introduction
The `ATQ` package provides tools for public health institutions to detect epidemics using
school absenteeism data. It offers functions to simulate regional populations of
households, elementary schools, and epidemics, and to calculate alarm metrics from these
simulations.
This package builds on the work of Ward et al. and Vanderkruk et al. It introduces the
Alert Time Quality (ATQ) metrics such as the Average ATQ (AATQ) and First ATQ (FATQ), to
evaluate the timeliness and accuracy of epidemic alerts. This vignette demonstrates the
package's use through a simulation study based on Vanderkruk et al., modeling yearly
influenza epidemics and their alarm metrics in the Wellington-Dufferin-Guelph public
health unit, Canada.
To use the package, install and load it with:
```{r setup, message = FALSE, warning = FALSE}
tryCatch({
devtools::install_github("vjoshy/ATQ_Surveillance_Package")
}, error = function(e) {
message("Unable to install package from GitHub. Using local version if available.")
})
library(ATQ)
```
The following sections will guide you through population simulation, epidemic modeling,
and alarm metric calculation using the `ATQ` package.
## Methods
`ATQ` provides a simulation model that consists of three sequential parts: 1) a population
of individuals, 2) annual influenza epidemics, 3) school absenteeism and laboratory
confirmed influenza case data. The final part of this section will include alarm metrics
evaluation.
### Population simulation
To simulate the population of the Wellington-Dufferin-Guelph (WDG) region in Ontario,
Canada, the package offers the following functions:
- `catchment_sim`: Simulates catchment areas using a default gamma distribution for the
number of schools in each area. The `dist_func` argument allows for specifying other
distributions.
- `elementary_pop`: Simulates elementary school enrollment and assigns students to
catchments using a default gamma distribution. This function requires the output of
`catchment_sim.` The `dist_func` argument can be modified for other distributions.
- `subpop_children`: Simulates households with children using the output of
`elementary_pop.` It requires specifying population proportions such as coupled
parents, number of children per household type, and proportion of elementary
school-age children. Distributions for parent, child, and age simulations can be
specified.
- `subpop_noChildren`: Simulates households without children using the outputs of
`subpop_children` and `elementary_pop.` It requires specifying proportions of
household sizes and the overall proportion of households with children.
- `simulate_households`: Creates a list containing two simulated populations: households
and individuals.
If population proportions are not provided to `subpop_children` and `subpop_noChildren`,
the functions will prompt the user for input.
```{r populationSimulation}
set.seed(123)
# Simulate 16 catchments of 80x80 squares and the number of schools they contain
catchment <- catchment_sim(16, 80, dist_func = stats::rgamma, shape = 4.313, rate = 3.027)
# Simulate population size of elementary schools
elementary<- elementary_pop(catchment, dist_func = stats::rgamma, shape = 5.274, rate = 0.014)
# Simulate households with children
house_children <- subpop_children(elementary, n = 5,
prop_parent_couple = 0.7668901,
prop_children_couple = c(0.3634045, 0.4329440, 0.2036515),
prop_children_lone = c(0.5857832, 0.3071523, 0.1070645),
prop_elem_age = 0.4976825)
# Simulate households without children using pre-specified proportions
house_noChild <- subpop_noChildren(house_children, elementary,
prop_house_size = c(0.23246269, 0.34281716, 0.16091418, 0.16427239, 0.09953358),
prop_house_Children = 0.4277052)
# Combine household simulations and generate individual-level data
households <- simulate_households(house_children, house_noChild)
```
### Epidemic and Laboratory Confirmed Cases simulation
The package simulates epidemics using a stochastic Susceptible-Infected-Removed (SIR)
framework. This approach differs from Vanderkruk et al., who used a spatial and
network-based individual-level model.
#### Simulation Process
- Initialization: The population is divided into S (Susceptible), I (Infectious), and R
(Removed) compartments. Initially, most individuals are susceptible, a few are
infectious, and none are removed.
- Start Date: A random start date for the epidemic is chosen based on specified average
and minimum start dates. Time Steps: The simulation proceeds in discrete time steps.
For each step:
a. Transmission Probability (p_inf): Calculated as
$1 - e^{-\alpha {\frac{I[t-1]}{N}}}$, where $\alpha$ is the transmission rate,
$I[t-1]$ is the number of infectious individuals at the previous time step, and
$N$ is the total population.
b. New Infections (new_inf): Determined by drawing from a binomial distribution with
parameters n (number of susceptible individuals) and p (transmission probability).
c. Compartment Updates:
- Susceptible (S): Decreases by new infections.
- Infectious (I): Increases by new infections, decreases by recoveries/deaths.
- Removed (R): Increases by recoveries/deaths.
d. Reported Cases: A subset of new infections is reported based on the reporting
rate, with delays added using an exponential distribution to reflect reporting
lag.
The summary and plot methods can be used to visualize and summarize the simulated
epidemics:
```{r epidemicSimulation}
# isolate individuals data
individuals <- households$individual_sim
# simulate epidemics for 10 years, each with a period of 300 days and 32 individuals infected initially
# infection period of 4 days
epidemic <- ssir(nrow(individuals), T = 300, alpha = 0.298, avg_start = 45,
min_start = 20, inf_period = 4, inf_init = 32, report = 0.02, lag = 7, rep = 10)
# Summarize and plot the epidemic simulation results
summary(epidemic)
plot(epidemic)
```
### Absenteeism simulation
The `compile_epi` function in this code compiles and processes epidemic data, simulating
school absenteeism using epidemic and individual data. It creates a data set for actual
cases, absenteeism and laboratory confirmed cases, this data set will also include a "True
Alarm Window", reference dates for each epidemic year and seasonal lag values.
Absenteeism data is simulated as follows:
- For each day, the proportion of infected individuals based on new infection over the
past few days
- Whether each child is absent or not is determined using the logic:
- 95% of infected children stay home
- 5% of healthy children are absent for reasons other than sickness
The data is aggregated across all schools.
```{r absenteeism}
absent_data <- compile_epi(epidemic, individuals)
dplyr::glimpse(absent_data)
```
### Alarm Metrics Evaluation
The `eval_metrics` function assesses the performance of epidemic alarm systems across
various lags and thresholds using school absenteeism data. It evaluates the following key
metrics:
- False Alarm Rate (FAR): Proportion of alarms raised outside the true alarm window.
- Added Days Delayed (ADD): Measures how many days after the optimal alarm day the first
true alarm was raised.
- Average Alarm Time Quality (AATQ): Mean quality of all alarms raised, considering
their timing relative to the optimal alarm day.
- First Alarm Time Quality (FATQ): Quality of the first alarm raised, based on its
timing.
- Weighted versions (WAATQ, WFATQ): Apply year-specific weights to AATQ and FATQ.
A logistic regression model with lagged absenteeism and fixed seasonal terms given by:
\[
\text{logit}(\theta_{tj}) = \beta_0 + \sum_{k=0}^l \beta_{k+1}x_{(t-k)j} + \beta_{l+2}\sin\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \beta_{l+3}\cos\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \gamma_j
\]
where \(\theta_{tj}\) represents the probability of having at least one reported case on day \( t \) for school year \( j \). The predictor variables \( x_{(t-k)j} \) are the mean daily absenteeism percentages with lag times ranging from 0 to \( l \). To account for seasonal variations in influenza, trigonometric functions with sine and cosine terms are included, where \( t^* \) denotes the calendar day of the year when \( x_{tj} \) is observed and \( T^* = 365.25 \) days represents the length of a year. Additionally, \(\gamma_j\) captures random effects specific to each school year \( j \).
`eval_metrics` also identifies the best model parameters (lag & threshold) for each
metric. The output is a list with three main components:
- `metrics`: An object containing:
- matrices of each metric (FAR, ADD, AATQ, FATQ, WAATQ, WFATQ) for all lag and
threshold combinations.
- best models according to each metric, including lag and threshold values.
- `plot_data`: plot object to visualize epidemic data and the best model for each metric
- `results`: provides summary statistics
In the example provided, alarms are calculated for school years 2 to 10, considering lags
up to 15 days and threshold values ranging from 0.1 to 0.6 in 0.05 increments. Year
weights are assigned proportionally to the school year number.
```{r metrics, fig.height = 4, fig.width = 7}
# Evaluate alarm metrics for epidemic detection
# lag of 15
alarms <- eval_metrics(absent_data, maxlag = 15, thres = seq(0.1,0.6,by = 0.05))
summary(alarms$results)
# Plot various alarm metrics values
plot(alarms$metrics, "FAR") # False Alert Rate
plot(alarms$metrics, "ADD") # Accumulated Days Delayed
plot(alarms$metrics, "FATQ") # First Alert Time Quality
plot(alarms$metrics, "AATQ") # Average ATQ
plot(alarms$metrics, "WFATQ") # Weighted FATQ
plot(alarms$metrics, "WAATQ") # Weighted Average ATQ
# visualization of epidemics with alarms raised.
alarm_plots <- plot(alarms$plot_data)
for(i in seq_along(alarm_plots)) {
print(alarm_plots[[i]])
}
```
## References
Vanderkruk, K.R., Deeth, L.E., Feng, Z. et al. ATQ: alert time quality, an evaluation
metric for assessing timely epidemic detection models within a school absenteeism-based
surveillance system. BMC Public Health 23, 850 (2023).