---
title: "2. Mathematical Details"
output:
  bookdown::html_document2:
    base_format: rmarkdown::html_vignette
    fig_caption: yes
    toc: true
    toc_depth: 2
    number_sections: true
pkgdown:
  as_is: true
vignette: >
  %\VignetteIndexEntry{2. Mathematical Details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


```{r setup}
library(bage)
```

# Introduction

Specification document - a mathematical description of models used by bage.

Note: some features described here have not been implemented yet.

# Likelihood {#sec:lik}

## Input data

- outcome variable: events, numbers of people, or some sort of measure on a continuous variable such as income or expenditure
- exposure/size/weights
- disagg by one or more variables. Almost always includes age, sex/gender, and time. May include other variables eg region, ethnicity, education.
- not all combinations of variables present; may be some missing values


## Models

### Poisson {#sec:pois}

Let $y_i$ be a count of events in cell $i = 1, \cdots, n$ and let $w_i$ be the corresponding exposure measure, with the possibility that $w_i \equiv 1$. The likelihood under the Poisson model is then
\begin{align}
  y_i & \sim \text{Poisson}(\gamma_i w_i) (\#eq:lik-pois-1) \\
  \gamma_i & \sim \text{Gamma}\left(\xi^{-1}, (\mu_i \xi)^{-1}\right),  (\#eq:lik-pois-2)
\end{align}
using the shape-rates parameterisation of the Gamma distribution. Parameter $\xi$ governs dispersion, with
\begin{equation}
  \text{var}(\gamma_i \mid \mu_i, \xi) = \xi \mu_i^2
\end{equation}
and
\begin{equation}
  \text{var}(y_i \mid \mu_i, \xi, w_i) = (1 + \xi \mu_i w_i ) \times \mu_i w_i.
\end{equation}
We allow $\xi$ to equal 0, in which case the model reduces to 
\begin{equation}
  y_i \sim \text{Poisson}(\mu_i w_i).
\end{equation}

For $\xi > 0$, Equations \@ref(eq:lik-pois-1) and \@ref(eq:lik-pois-2) are equivalent to
\begin{equation}
  y_i \sim \text{NegBinom}\left(\xi^{-1}, (1 + \mu_i w_i \xi)^{-1}\right)
\end{equation}
[@norton2018sampling; @simpson2022priors].
This is the format we use internally for estimation. When values for $\gamma_i$ are needed, we generate them on the fly, using the fact that
\begin{equation}
  \gamma_i \mid y_i, w_i, \mu_i, \xi \sim \text{Gamma}\left(y_i + \xi^{-1}, w_i + (\xi \mu_i)^{-1}\right).
\end{equation}


### Binomial {#sec:binom}

The likelihood under the binomial model is
\begin{align}
  y_i  & \sim \text{Binomial}(w_i, \gamma_i) (\#eq:lik-binom-1) \\
  \gamma_i & \sim \text{Beta}\left(\xi^{-1} \mu_i, \xi^{-1}(1 - \mu_i)\right). (\#eq:lik-binom-2)
\end{align}
Parameter $\xi$ again governs dispersion, with
\begin{equation}
  \text{var}(\gamma_i \mid \mu_i, \xi) =  \frac{\xi}{1 + \xi} \times \mu_i (1 -\mu_i)
\end{equation}
and
\begin{equation}
  \text{var}(y_i \mid w_i, \mu_i, \xi) =  \frac{\xi w_i + 1}{\xi + 1} \times w_i \mu_i (1 - \mu_i).
\end{equation}

We allow $\xi$ to equal 0, in which case the model reduces to 
\begin{equation}
  y_i  \text{Binom}(w_i, \mu_i).
\end{equation}
Equations \@ref(eq:lik-binom-1) and \@ref(eq:lik-binom-2) are equivalent to
\begin{equation}
  y_i  \sim \text{BetaBinom}\left(w_i, \xi^{-1} \mu_i, \xi^{-1} (1 - \mu_i) \right),
\end{equation}
which is what we use internally. Values for $\gamma_i$ can be generated using
\begin{equation}
  \gamma_i \mid y_i, w_i, \mu_i, \xi \sim \text{Beta}\left(y_i + \xi^{-1} \mu_i, w_i - y_i + \xi^{-1}(1-\mu_i) \right).
\end{equation}


### Normal {#sec:norm}

\begin{equation}
  y_i \sim \text{N}(\mu_i, w_i^{-1}\xi^2)
\end{equation}
where the $w_i$ are weights.

Response $y_i$ is standardized to have mean 0 and standard deviation 1. We set
\begin{equation}
  y_i = \frac{y_i^{*} - \bar{y}^*}{s^*}
\end{equation}
where the $y_i^*$ are the values originally supplied by the user, and $\bar{y}^*$ and $s^*$ are the mean and standard deviation of the $y_i^*$.


# Model for means

Let $\pmb{\mu} = (\mu_1, \cdots, \mu_n)^{\top}$. Our model for $\pmb{\mu}$ is
\begin{equation}
  \pmb{\mu} = \sum_{m=0}^{M} \pmb{X}^{(m)} \pmb{\beta}^{(m)} + \pmb{Z} \pmb{\zeta} (\#eq:means)
\end{equation}
where

- $\beta^{(0)}$ is an intercept;
- $\pmb{\beta}^{(m)}$, $m=1,\cdots,M$ is a vector with $J_m$ elements describing a main effect or interaction formed from the dimensions of data $\pmb{y}$;
- $\pmb{X}^{(m)}$ is an $n \times J_m$ matrix of 1s and 0s, the $i$th row of which picks out the element of $\pmb{\beta}^{(m)}$ that is used with cell $i$;
- $\pmb{Z}$ is a $n \times P$ matrix of covariates; and
- $\pmb{\zeta}$ is a coefficient vector with $P$ elements.


# Priors for Intercept, Main Effects, and Interactions {#sec:priors}


The algorithm for assigning default priors:

- If $\pmb{\beta}^{(m)}$ has one or two elements, assign $\pmb{\beta}^{(m)}$ a fixed-normal prior (Section \@ref(sec:pr-fnorm));
- otherwise, if $\pmb{\beta}^{(m)}$ is an age or time main effect, assign $\pmb{\beta}^{(m)}$ a random walk prior (Section \@ref(sec:pr-rw));
- otherwise, assign $\pmb{\beta}^{(m)}$ a normal prior (Section \@ref(sec:pr-norm))

The intercept term $\pmb{\beta}^{(0)}$ can only be given a fixed-normal prior (Section \@ref(sec:pr-fnorm)) or a Known prior (Section \@ref(sec:pr-known)).


## Normal {#sec:pr-norm}

### Model

\begin{align}
  \beta_j^{(m)} & \sim \text{N}\left(0, \tau_m^2 \right) \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right)
\end{align}

### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{j=1}^{J_m} \text{N}(\beta_j^{(m)} \mid 0, \tau_m^2) 
\end{equation}


### Simulation

\begin{align}
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\
  \beta_j^{(m)} & \sim \text{N}\left(0, \tau_m^2 \right)
\end{align}


### Forecasting

\begin{equation}
    \beta_{J_m+h+1}^{(m)} \sim \text{N}(0, \tau_m^2)
\end{equation}


### Code

```
N(s = 1)
```

- `s` is $A_{\tau}^{(m)}$. Defaults to 1.

## Fixed normal {#sec:pr-fnorm}

### Model

\begin{equation}
  \beta_j^{(m)} \sim \text{N}\left(0, A_{\beta}^{(m)2}\right)
\end{equation}

### Contribution to posterior density

\begin{equation}
  \prod_{j=1}^{J_m} \text{N}(\beta_j^{(m)} \mid 0, A_{\beta}^{(m)2}) 
\end{equation}


### Simulation

\begin{equation}
  \beta_j^{(m)} \sim \text{N}\left(0, A_{\beta}^{(m)2}\right)
\end{equation}


### Forecasting

\begin{equation}
    \beta_{J_m+h+1}^{(m)} \sim \text{N}(0, A_{\beta}^{(m)2})
\end{equation}


### Code

```
NFix(sd = 1)
```

- `sd` is $A_{\tau}^{(m)}$. Defaults to 1.


## First-order random walk {#sec:pr-rw}

### Model

\begin{align}
  \beta_{u1}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m \\
  \beta_{uv}^{(m)} & \sim \text{N}(\beta_{u,v-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, v = 2, \cdots, V_m \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right)
\end{align}

### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \text{N}(\beta_{u1}^{(m)} \mid 0, 1)  \prod_{u=1}^{U_m} \prod_{v=2}^{V_m} \text{N}\left(\beta_{uv}^{(m)} \mid \beta_{u,v-1}^{(m)}, \tau_m^2 \right) 
\end{equation}


### Simulation

\begin{align}
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\
  \beta_{u,1}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, \\
  \beta_{u,v}^{(m)} & \sim \text{N}(\beta_{u,v-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, v = 2, \cdots, V_m
\end{align}


### Forecasting

\begin{equation}
  \beta_{u,V_m+h}^{(m)} \sim \text{N}(\beta_{u,V_m+h-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m
\end{equation}


### Code

```
RW(s = 1, along = NULL)
```

- `s` is $A_{\tau}^{(m)}$. Defaults to 1.
- `along` used to identify "along" and "by" dimensions


## Second-order random walk {#sec:pr-rw2}

### Model

\begin{align}
  \beta_{u,v}^{(m)} & \sim \text{N}(0, 1), \quad u = 1,\cdots, U_m, \quad v = 1,2 \\
  \beta_{u,v}^{(m)} & \sim \text{N}(2 \beta_{u,v-1}^{(m)} - \beta_{u,v-2}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 3, \cdots, V_m \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right)
\end{align}


### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{v=1}^2 \text{N}(\beta_{u,v}^{(m)} \mid 0, 1) \prod_{u=1}^{U_m}\prod_{v=3}^{V_m} \text{N}\left(\beta_{u,v}^{(m)} - 2 \beta_{u,v-1}^{(m)} + \beta_{u,v-2}^{(m)} \mid 0, \tau_m^2 \right) 
\end{equation}


### Simulation

\begin{align}
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\
  \beta_{u,v}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, \quad v = 1,2 \\
  \beta_{u,v}^{(m)} & \sim \text{N}(2 \beta_{u,v-1}^{(m)} - \beta_{u,v-2}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 3,\cdots,V_m
\end{align}

### Forecasting

\begin{equation}
  \beta_{u,V_m+h}^{(m)} \sim \text{N}(2 \beta_{u,V_m+h-1}^{(m)} - \beta_{u,V_m+h-2}^{(m)}, \tau_m^2)
\end{equation}


### Code

```
RW2(s = 1, sd = 1, along = NULL)
```

- `s` is $A_{\tau}^{(m)}$
- `sd` is $A_{\eta}^{(m)}$
- `along` used to identify "along" and "by" dimensions


## Random walk with seasonal effect {#sec:rwseas}

### Model

\begin{align}
  \beta_{u,v}^{(m)} & = \alpha_{u,v} + \lambda_{u,s_v}, \quad u = 1, \cdots, U_m, \quad v = 1, \cdots, V_m \\
  \alpha_{u,1}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m \\
  \alpha_{u,v}^{(m)} & \sim \text{N}(\alpha_{u,v-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 2, \cdots, V_m \\
  \lambda_{u,v}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, \quad v = 1, \cdots, S_m \\
  \lambda_{u,v}^{(m)} & \sim \text{N}(\lambda_{u,v-S_m}^{(m)}, \omega_m^2), \quad u = 1, \cdots, U_m, \quad v = S_m + 1, \cdots, V_m (\#eq:lambda-rwseas) \\ \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\
  \omega_m & \sim \text{N}^+\left(0, A_{\omega}^{(m)2}\right)
\end{align}

We allow $A_{\omega}^{(m)2}$ to be set to zero, in which case \@ref(eq:lambda-rwseas) reduces to
\begin{equation}
  \lambda_{u,v}^{(m)} = \lambda_{u,v-S_m}^{(m)}, \quad u = 1, \cdots, U_m, \quad v = S_m + 1, \cdots, V_m,
\end{equation}
implying that seasonal effects are constant across years.

### Contribution to posterior density

\begin{equation}
\begin{split}
  & \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \text{N}(\omega_m \mid 0, A_{\omega}^{(m)2}) \\
  \quad \times & \prod_{u=1}^{U_m} \left( \text{N}(\alpha_{u,1}^{(m)} \mid 0, 1 ) \prod_{v=2}^{V_m} \text{N}(\alpha_{u,v}^{(m)} \mid \alpha_{u,v-1}^{(m)}, \tau_m^2 ) \prod_{v=1}^{S_m} \text{N}(\lambda_{u,v}^{(m)} \mid 0, 1) \prod_{v=S_m+1}^{V_m} \text{N}(\lambda_{u,v}^{(m)} \mid \lambda_{u,v-S_m}^{(m)}, \omega_m^2) \right)
\end{split}
\end{equation}

### Forecasting

\begin{align}
  \alpha_{J_m+h}^{(m)} & \sim \text{N}(\alpha_{J_m+h-1}^{(m)}, \tau_m^2) \\
  \lambda_{J_m+h}^{(m)} & \sim \text{N}(\lambda_{J_m+h-S_m}^{(m)}, \omega_m^2) \\
  \beta_{J_m+h}^{(m)} & = \alpha_{J_m+h}^{(m)} + \lambda_{J_m+h}^{(m)}
\end{align}

### Code

```
RWSeas(n, s = 1, s_seas = 1, along = NULL)
```

- `n` is $S_m$
- `s` is $A_{\tau}^{(m)}$
- `s_seas` is $A_{\omega}^{(m)}$
- `along` used to identify "along" and "by" dimensions


## Second-order randomw walk, with seasonal effect {#sec:rw2seas}

### Model

\begin{align}
  \beta_{u,v}^{(m)} & = \alpha_{u,v} + \lambda_{u,s_v}, \quad u = 1, \cdots, U_m, \quad v = 1, \cdots, V_m \\
  \alpha_{u,v}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, \quad \quad v = 1,2,  \\
  \alpha_{u,v}^{(m)} & \sim \text{N}(2 \alpha_{u,v-1}^{(m)} - \alpha_{u,v-2}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 3, \cdots, V_m \\
  \lambda_{u,v}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, \quad v = 1, \cdots, S_m \\
  \lambda_{u,v}^{(m)} & \sim \text{N}(\lambda_{u,v-S_m}^{(m)}, \omega_m^2), \quad u = 1, \cdots, U_m, \quad v = S_m + 1, \cdots, V_m (\#eq:lambda-rwseas) \\ \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\
  \omega_m & \sim \text{N}^+\left(0, A_{\omega}^{(m)2}\right)
\end{align}

We allow $A_{\omega}^{(m)2}$ to be set to zero, in which case \@ref(eq:lambda-rwseas) reduces to
\begin{equation}
  \lambda_{u,v}^{(m)} = \lambda_{u,v-S_m}^{(m)}, \quad u = 1, \cdots, U_m, \quad v = S_m + 1, \cdots, V_m,
\end{equation}
implying that seasonal effects are constant across years.

### Contribution to posterior density

\begin{equation}
\begin{split}
  & \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \text{N}(\omega_m \mid 0, A_{\omega}^{(m)2}) \\
  \quad \times & \prod_{u=1}^{U_m} \left( \text{N}(\alpha_{u,1}^{(m)} \mid 0, 1 ) \prod_{v=2}^{V_m} \text{N}(\alpha_{u,v}^{(m)} \mid 2 \alpha_{u,v-1}^{(m)} - \alpha_{u,v-2}^{(m)}, \tau_m^2 ) \prod_{v=1}^{S_m} \text{N}(\lambda_{u,v}^{(m)} \mid 0, 1) \prod_{v=S_m+1}^{V_m} \text{N}(\lambda_{u,v}^{(m)} \mid \lambda_{u,v-S_m}^{(m)}, \omega_m^2) \right)
\end{split}
\end{equation}

### Forecasting

\begin{align}
  \alpha_{J_m+h}^{(m)} & \sim \text{N}(2 \alpha_{J_m+h-1}^{(m)} - \alpha_{J_m+h-2}^{(m)}, \tau_m^2) \\
  \lambda_{J_m+h}^{(m)} & \sim \text{N}(\lambda_{J_m+h-S_m}^{(m)}, \omega_m^2) \\
  \beta_{J_m+h}^{(m)} & = \alpha_{J_m+h}^{(m)} + \lambda_{J_m+h}^{(m)}
\end{align}

### Code

```
RW2Seas(n, s = 1, s_seas = 1, along = NULL)
```

- `n` is $S_m$
- `s` is $A_{\tau}^{(m)}$
- `s_seas` is $A_{\omega}^{(m)}$
- `along` used to identify "along" and "by" dimensions


## Autoregressive {#sec:pr-ar}

### Model

\begin{equation}
  \beta_{u,v}^{(m)} \sim \text{N}\left(\phi_1^{(m)} \beta_{u,v-1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \beta_{u,v-{K_m}}^{(m)},  \omega_m^2\right), \quad u = 1, \cdots, U_m, \quad v = K_m + 1, \cdots, V_m.
\end{equation}
Internally, TMB derives values for $\beta_{u,v}^{(m)}, v = 1, \cdots, K_m$, and for $\omega_m$, that imply a stationary distribution, and that give every term $\beta_{u,v}^{(m)}$ the same marginal variance. We denote this marginal variance $\tau_m^2$, and assign it a prior
\begin{equation}
  \tau_m  \sim \text{N}^+(0, A_{\tau}^{(m)2}).
\end{equation}
Each of the individual $\phi_k^{(m)}$ is restricted to the interval $(-1, 1)$, and the $\phi_k^{(m)}$ are jointly restricted to values that yield stationary models. Let
\begin{equation}
  r^{(m)} = \sqrt{\phi_1^{(m)2} + \cdots + \phi_{K_m}^{(m)2}}.
\end{equation}
We assign $r^{(m)}$ the prior
\begin{equation}
  r^{(m)} \sim \text{Beta}(2, 2).
\end{equation}


### Contribution to posterior density

\begin{equation}
  \text{N}^+\left(\tau_m \mid 0, A_{\tau}^{(m)2} \right) \text{Beta}\left( r^{(m)} \mid 2, 2 \right) \prod_{u=1}^{U_m} p\left( \beta_{u,1}^{(m)}, \cdots, \beta_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) 
\end{equation}
where $p\left( \beta_{u,1}^{(m)}, \cdots, \beta_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right)$ is calculated internally by TMB.

### Forecasting

\begin{equation}
  \beta_{u,V_m + h}^{(m)} \sim \text{N}\left(\phi_1^{(m)} \beta_{u,V_m + h - 1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \beta_{u,V_m+h-K_m}^{(m)}, \tau_m^2\right)
\end{equation}


### Code

```
AR(n = 2, s = 1, along = NULL)
```

- `n` is $K_m$
- `s` is $A_{\tau}^{(m)}$
- `along` is used to indentify the "along" and "by" dimensions


## AR1 {#sec:pr-ar1}

Special case or AR, but with extra options for autocorrelation coefficient.

### Model

\begin{align}
  \beta_{u,1}^{(m)} & \sim \text{N}(0, \tau_m^2), \quad u = 1, \cdots, U_m \\
  \beta_{u,v}^{(m)} & \sim \text{N}(\phi_m \beta_{u,v-1}^{(m)}, (1 - \phi_m^2) \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 2, \cdots, V_m \\
  \phi_m & = a_{0,m} + (a_{1,m} - a_{0,m}) \phi_m^{\prime} \\
  \phi_m^{\prime} & \sim \text{Beta}(2, 2) \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right).
p\end{align}
This is adapted from the specification used for AR1 densities in [TMB](http://kaskr.github.io/adcomp/classdensity_1_1AR1__t.html). It implies that the marginal variance of all $\beta_{u,v}^{(m)}$ is $\tau_m^2$. We require that $-1 < a_{0m} < a_{1m} < 1$.


### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \text{Beta}( \phi_m^{\prime} \mid 2, 2) \prod_{u=1}^{U_m} \text{N}\left(\beta_{u,1}^{(m)} \mid 0, \tau_m^2 \right)  \prod_{u=1}^{U_m} \prod_{j=2}^{V_m} \text{N}\left(\beta_{u,v}^{(m)} \mid \phi_m \beta_{u,v-1}^{(m)}, (1 - \phi_m^2) \tau_m^2 \right) 
\end{equation}


### Forecasting

\begin{equation}
  \beta_{J_m + h}^{(m)} \sim \text{N}\left(\phi_m \beta_{J_m + h - 1}^{(m)}, (1 - \phi_m^2) \tau_m^2\right)
\end{equation}


### Code

```
AR1(min = 0.8, max = 0.98, s = 1, along = NULL)
```

- `min` is $a_{0m}$
- `max` is $a_{1m}$
- `s` is $A_{\tau}^{(m)}$. Defaults to 1.
- `along` is used to identify "along" and "by" dimensions

The defaults for `min` and `max` are based on the defaults for function `ets()` in R package **forecast** [@hyndman2008automatic].


## Linear {#sec:pr-lin}

### Model

\begin{align}
  \beta_{u,v}^{(m)} & \sim \text{N}(\alpha_u^{(m)} + v \eta_u^{(m)}, \tau_m^2), \quad u = 1,\cdots,U_m, \quad v = 1, \cdots, V_m  \\
  \alpha_u^{(m)} & \sim \text{N}(0, 1), \quad u = 1,\cdots,U_m  \\
  \eta_u^{(m)} & \sim \text{N}\left(0, A_{\eta}^{(m)2}\right), \quad u = 1, \cdots, U_m \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right)
\end{align}


### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \text{N}(\alpha_u^{(m)} \mid 0, 1) \text{N}(\eta_u^{(m)} \mid 0, A_{\eta}^{(m)2}) \prod_{u=1}^{U_m} \prod_{v=1}^{V_m} \text{N}\left(\beta_{u,v}^{(m)} | \alpha_u^{(m)} + v \eta_u^{(m)}, \tau_m^2 \right) 
\end{equation}


### Forecasting

\begin{equation}
  \beta_{u,V_m + h}^{(m)} \sim \text{N}(\alpha_u^{(m)} + (V_m + h) \eta_u^{(m)}, \tau_m^2)
\end{equation}


### Code

```
Lin(s = 1, sd = 1, along = NULL)
```

- `s` is $A_{\tau}^{(m)}$
- `sd` is $A_{\eta}^{(m)}$
- `along` is used to indentify "along" and "by" dimensions


## Linear-AR {#sec:pr-lin-ar}

### Model

\begin{align}
  \beta_{u,v}^{(m)} & = \alpha_u^{(m)} + \eta_u^{(m)} v + \epsilon_{u,v}^{(m)}, \quad u = 1, \cdots, U_m, \quad v = 1, \cdots, V_m \\
  \alpha_u^{(m)} & \sim \text{N}\left(0, 1\right), \quad u = 1, \cdots, U_m \\
  \eta_u^{(m)} & \sim \text{N}\left(0, A_{\eta}^{(m)2}\right), \quad u = 1, \cdots, U_m \\
  \epsilon_{u,v}^{(m)} & \sim \text{N}\left(\phi_1^{(m)} \epsilon_{u,v-1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \epsilon_{u,v-{K_m}}^{(m)},  \omega_m^2\right), \quad u = 1, \cdots, U_m, \quad v = K_m + 1, \cdots, V_m.
\end{align}


Internally, TMB derives values for $\epsilon_{u,v}^{(m)}, v = 1, \cdots, K_m$, and for $\omega_m$, that provide the $\epsilon_{u,v}^{(m)}$ with a stationary distribution in which each term has the same marginal variance. We denote this marginal variance $\tau_m^2$, and assign it a prior
\begin{equation}
  \tau_m  \sim \text{N}^+(0, A_{\tau}^{(m)2}).
\end{equation}
Each of the individual $\phi_k^{(m)}$ is restricted to the interval $(-1, 1)$, and the $\phi_k^{(m)}$ are jointly restricted to values that yield stationary models. Let
\begin{equation}
  r^{(m)} = \sqrt{\phi_1^{(m)2} + \cdots + \phi_{K_m}^{(m)2}}.
\end{equation}
We assign $r^{(m)}$ the prior
\begin{equation}
  r^{(m)} \sim \text{Beta}(2, 2).
\end{equation}


### Contribution to posterior density

\begin{equation}
  \text{N}^+\left(\tau_m \mid 0, A_{\tau}^{(m)2} \right) \text{Beta}\left( r_k^{(m)} \mid 2, 2 \right) \prod_{u=1}^{U_m} \text{N}(\alpha_u^{(m)} \mid 0, 1) \text{N}(\eta_u^{(m)} \mid 0, A_{\eta}^{(m)2}) p\left( \epsilon_{u,1}^{(m)}, \cdots, \epsilon_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) 
\end{equation}
where $p\left( \epsilon_{u,1}^{(m)}, \cdots, \epsilon_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right)$ is calculated internally by TMB.

### Forecasting

\begin{align}
  \beta_{u, V_m + h}^{(m)} & = \alpha_u^{(m)} + \eta_u^{(m)} (V_m + h) + \epsilon_{u,V_m+h}^{(m)} \\
  \epsilon_{u,V_m+h}^{(m)} & \sim \text{N}\left(\phi_1^{(m)} \epsilon_{u,V_m + h - 1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \epsilon_{u,V_m+h-K_m}^{(m)}, \omega_m^2\right)
\end{align}


### Code

```
Lin_AR(s = 1, sd = 1, along = NULL)
```

- `s` is $A_{\tau}^{(m)}$
- `sd` is $A_{\eta}^{(m)}$
- `along` is used to indentify "along" and "by" variables


## P-Spline {#sec:pr-spline}

### Model

\begin{equation}
  \pmb{\beta}_u^{(m)} = \pmb{B}^{(m)} \pmb{\alpha}_u^{(m)}, \quad u = 1, \cdots, U_m
\end{equation}
where $\pmb{\beta}_u^{(m)}$ is the subvector of $\pmb{\beta}^{(m)}$ composed of elements from the $u$th combination of the "by" variables, $\pmb{B}^{(m)}$ is a $V_m \times K_m$ matrix of B-splines, and $\pmb{\alpha}_u^{(m)}$ has a second-order random walk prior (Section \@ref(sec:pr-rw2)).


$\pmb{B}^{(m)} = (\pmb{b}_1^{(m)}(\pmb{v}), \cdots, \pmb{b}_{K_m}^{(m)}(\pmb{v}))$, with $\pmb{v} = (1, \cdots, V_m)^{\top}$. The B-splines are centered, so that $\pmb{1}^{\top} \pmb{b}_k^{(m)}(\pmb{v}) = 0$, $k = 1, \cdots, K_m$.


### Contribution to posterior density

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{k=1}^2 \text{N}(\alpha_{u,k}^{(m)} \mid 0, 1) \prod_{u=1}^{U_m}\prod_{k=3}^{K_m} \text{N}\left(\alpha_{u,k}^{(m)} - 2 \alpha_{u,k-1}^{(m)} + \alpha_{u,k-2}^{(m)} \mid 0, \tau_m^2 \right) 
\end{equation}


### Forecasting

Terms with a P-Spline prior cannot be forecasted.


### Code

```
Sp(n = NULL, s = 1)
```

- `n` is $K_m$. Defaults to $\max(0.7 J_m, 4)$.
- `s` is the $A_{\tau}^{(m)}$ from the second-order random walk prior. Defaults to 1.
- `along` is used to identify "along" and "by" variables


## SVD {#sec:pr-svd}

### Model

**Age but no sex or gender**

Let $\pmb{\beta}_u$ be the age effect for the $u$th combination of the 'by' variables. With an SVD prior,
\begin{equation}
  \pmb{\beta}_u^{(m)} = \pmb{F}^{(m)} \pmb{\alpha}_u^{(m)} + \pmb{g}^{(m)}, \quad u = 1, \cdots, U_m
\end{equation}
where $\pmb{F}^{(m)}$ is a $V_m \times K_m$ matrix, and $\pmb{g}^{(m)}$ is a vector with $V_m$ elements, both derived from a singular value decomposition (SVD) of an external dataset of age-specific values for all sexes/genders combined. The construction of $\pmb{F}^{(m)}$ and $\pmb{g}^{(m)}$ is described in Appendix \@ref(app:svd). The centering and scaling used in the construction allow use of the simple prior
\begin{equation}
  \alpha_{u,k}^{(m)} \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, k = 1, \cdots, K_m.
\end{equation}

**Joint model of age and sex/gender**

In the joint model, vector $\pmb{\beta}_u$ represents the interaction between age and sex/gender for the $u$th combination of the 'by' variables. Matrix $\pmb{F}^{(m)}$ and vector $\pmb{g}^{(m)}$ are calculated from data that separate sexes/genders. The model is otherwise unchanged.


**Independent models for each sex/gender**

In the independent model, vector $\pmb{\beta}_{s,u}$ represents age effects for sex/gender $s$ and the $u$th combination of the 'by' variables, and we have
\begin{equation}
  \pmb{\beta}_{s,u}^{(m)} = \pmb{F}_s^{(m)} \pmb{\alpha}_{s,u}^{(m)} + \pmb{g}_s^{(m)}, \quad s = 1, \cdots, S; \quad u = 1, \cdots, U_m
\end{equation}
Matrix $\pmb{F}_s^{(m)}$ and vector $\pmb{g}_s^{(m)}$ are calculated from data that separate sexes/genders. The prior is
\begin{equation}
  \alpha_{s,u,k}^{(m)} \sim \text{N}(0, 1), \quad s = 1, \cdots, S; \quad u = 1, \cdots, U_m; \quad k = 1, \cdots, K_m.
\end{equation}


### Contribution to posterior density

\begin{equation}
  \prod_{u=1}^{U_m}\prod_{k=1}^{K_m} \text{N}\left(\alpha_{uk}^{(m)} \mid 0, 1 \right) 
\end{equation}
for the age-only and joint models, and
\begin{equation}
  \prod_{s=1}^S \prod_{u=1}^{U_m}\prod_{k=1}^{K_m} \text{N}\left(\alpha_{s,u,k}^{(m)} \mid 0, 1 \right) 
\end{equation}
for the independent model

### Forecasting

Terms with an SVD prior cannot be forecasted.


### Code

```
SVD(ssvd, n_comp = NULL, indep = TRUE)
```
where
- `ssvd` is an object containing $\pmb{F}$ and $\pmb{g}$
- `n_comp` is the number of components to be used (which defaults to `ceiling(n/2)`, where `n` is the number of components in `ssvd`
- `indep` determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.


## SVD_RW {#sec:pr-svd-rw}

### Model

The `SVD_RW()` prior is identical to the `SVD()` prior except that the coefficients evolve over time, following independent random walks. For instance, in the combined-sex/gender and joint models,

\begin{align}
  \pmb{\beta}_{u,t}^{(m)} & = \pmb{F}^{(m)} \pmb{\alpha}_{u,t}^{(m)} + \pmb{g}^{(m)}, \quad u = 1, \cdots, U_m; \quad t = 1, \cdots, T \\
  \alpha_{u,k,1}^{(m)} & \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, k = 1, \cdots, K_m \\
  \alpha_{u,k,t}^{(m)} & \sim \text{N}(\alpha_{u,k,t-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m, k = 1, \cdots, K_m; t = 2, \cdots, T \\
  \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right)
\end{align}

### Contribution to posterior density

In the combined-sex/gender and joint models,

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{k=1}^{K_m} \text{N}(\alpha_{u,k,1}^{(m)} \mid 0, 1)  \prod_{t=2}^{T} \text{N}\left(\alpha_{u,k,t}^{(m)} \mid \alpha_{u,k,t-1}^{(m)}, \tau_m^2 \right),
\end{equation}

and in the independent model,

\begin{equation}
  \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{s=1}^{S} \prod_{k=1}^{K_m} \text{N}(\alpha_{u,s,k,1}^{(m)} \mid 0, 1)  \prod_{t=2}^{T} \text{N}\left(\alpha_{u,s,k,t}^{(m)} \mid \alpha_{u,s,k,t-1}^{(m)}, \tau_m^2 \right) 
\end{equation}


### Simulation

TODO - write

### Forecasting

\begin{align}
  \alpha_{u,k,T+h}^{(m)} & \sim \text{N}(\alpha_{u,k,T+h-1}^{(m)}, \tau_m^2), \quad u = 1, \cdots, U_m; \quad k = 1, \cdots, K_m \\
  \pmb{\beta}_{u,T+h}^{(m)} & = \pmb{F}^{(m)} \pmb{\alpha}_{u,T+h}^{(m)} + \pmb{g}^{(m)}, \quad u = 1, \cdots, U_m 
\end{align}


### Code

```
SVD_RW(ssvd, n_comp = NULL, s = 1, indep = TRUE)
```
where
- `ssvd` is an object containing $\pmb{F}$ and $\pmb{g}$
- `n_comp` is $K_m$ (which defaults to `ceiling(n/2)`, where `n` is the number of components in `ssvd`
- `s` is $A_{\tau}^{(m)}$
- `indep` determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.


## SVD_RW2 {#sec:pr-svd-rw2}

Same structure as `SVD_RW()`. TODO - write details.


## SVD_AR {#sec:pr-svd-ar}

Same structure as `SVD_RW()`. TODO - write details.

## SVD_AR1 {#sec:pr-svd-ar1}

Same structure as `SVD_RW()`. TODO - write details.


## Known {#sec:pr-known}

### Model

Elements of $\pmb{\beta}^{(m)}$ are treated as known with certainty.

### Contribution to posterior density

Known priors make no contribution to the posterior density.

### Forecasting

Main effects with a known prior cannot be forecasted.


### Code

```
Known(values)
```

- `values` is a vector containing the $\beta_j^{(m)}$.


# Covariates {#sec:pr-cov}

## Model

The columns of matrix $\pmb{Z}$ are assumed to be standardised to have mean 0 and standard deviation 1.  $\pmb{Z}$ does not contain a column for an intercept.

We implement two priors for coefficient vector $\pmb{\zeta}$. The first prior is designed for the case where $P$, the number of colums of $\pmb{Z}$, is small, and most $\zeta_p$ are likely to distinguishable from zero. The second prior is designed for the case where $P$ is large, and only a few $\zeta_p$ are likely to be distinguishable from zero.


### Standard prior

\begin{align}
  \zeta_p \mid \varphi & \sim \text{N}(0, \varphi^2) \\
  \varphi & \sim \text{N}^+(0, 1)
\end{align}


### Shrinkage prior

Regularized horseshoe prior [@piironen2017hyperprior]

\begin{align}
  \zeta_p \mid \vartheta_p, \varphi & \sim \text{N}(0, \vartheta_p^2 \varphi^2) \\
  \vartheta_p & \sim \text{Cauchy}^+(0, 1) \\
  \varphi & \sim \text{Cauchy}^+(0, A_{\varphi}^2) \\
  A_{\varphi} & = \frac{p_0}{p_0 + P} \frac{\hat{\sigma}}{\sqrt{n}}
\end{align}
where $p_0$ is an initial guess at the number of $\zeta_p$ that are non-zero, and $\hat{\sigma}$ is obtained as follows:

- Poisson. Using maximum likelihood, fit the GLM
\begin{align}
  y_i & \sim \text{Poisson}(w_i \gamma_i) \\
  \log \gamma_i & = \sum_{m=0}^{M} \pmb{X}^{(m)} \pmb{\beta}^{(m)},
\end{align}
and set
\begin{equation}
  \hat{\sigma} = \frac{1}{n}\sum_{i=1}^n \frac{1}{w_i \hat{\gamma}_i}.
\end{equation}
- Binomial. Using maximum likelihood, fit the GLM
\begin{align}
  y_i & \sim \text{binomial}(w_i, \gamma_i) \\
  \text{logit} \gamma_i & = \sum_{m=0}^{M} \pmb{X}^{(m)} \pmb{\beta}^{(m)},
\end{align}
and set
\begin{equation}
  \hat{\sigma} = \frac{1}{n}\sum_{i=1}^n \frac{1}{w_i \hat{\gamma}_i (1 - \hat{\gamma}_i)}.
\end{equation}
- Normal. Using maximum likelihood, fit the linear model
\begin{equation}
  y_i \sim \text{N}\left(\sum_{m=0}^{M} \pmb{X}^{(m)} \pmb{\beta}^{(m)}, w_i^{-1}\xi^2 \right)
\end{equation}
and set $\hat{\sigma} = \hat{\xi}$.


The quantities used for Poisson and binomial likelihoods are derived from normal approximations to GLMs [@piironen2017hyperprior; @gelman2014bayesian, section 16.2].

## Contribution to posterior density

TODO - write

## Forecasting

TODO - write

## Code

```
set_covariates(formula, data = NULL, n_coef = NULL)
```

- `formula` is a one-sided R formula describing the covariates to be used
- `data` A data frame. If a value for `data` is supplied, then `formula` is interpreted in the context of this data frame. If a value for `data` is not supplied, then `formula` is interpreted in the context of the data frame used for the original call to `mod_pois()`, `mod_binom()`, or `mod_norm()`.
- `n_coef` is the effective number of non-zero coefficients. If a value is supplied, the shrinkage prior is used; otherwise the standard prior is used.


Examples:
```
set_covariates(~ mean_income + distance * employment)
set_covariates(~ ., data = cov_data, n_coef = 5)
```


# Prior for dispersion terms

## Model

Use exponential distribution, parameterised using mean,
\begin{equation}
  \xi \sim \text{Exp}(\mu_{\xi})
\end{equation}


## Contribution to prior density

\begin{equation}
  p(\xi) = \frac{1}{\mu_{\xi}} \exp\left(\frac{-\xi}{\mu_{\xi}}\right)
\end{equation}

## Code

```
set_disp(mean = 1)
```

- `mean` is $\mu_{\xi}$


# Data models

## Data models for outcome

### Random Rounding to Base 3

Random rounding to base 3 (RR3) is a confidentialization method used by some statistical agencies. It is applied to counts data. Each count $x$ is rounded randomly as follows:

- If $x \mod 3 = 0$, then $x$ is left unchanged;
- if $x \mod 3 = 1$ then $x$ is changed to $x-1$ with probability 2/3, and is changed to $x + 2$ with probability 1/3; and
- if $x \mod 3 = 2$ then $x$ is changed to $x-2$ with probability 1/3, and is changed to $x + 1$ with probability 2/3.

RR3 data models can be used with Poisson or binomial likelihoods. Let $y_i$ denote the observed value for the outcome, and $y_i^*$  the true value. The likelihood with a RR3 data model is then

\begin{align}
  p(y_i | \gamma_i, w_i) & = \sum_{y_i^*} p(y_i | y_i^*) p(y_i^* | \gamma_i, w_i) \\
   & =  \sum_{k = -2, -1, 0, 1, 2} p(y_i | y_i + k) p(y_i + k | \gamma_i, w_i) \\
   & = \tfrac{1}{3} p(y_i - 2 | \gamma_i, w_i) + \tfrac{2}{3} p(y_i - 1 | \gamma_i, w_i) + p(y_i | \gamma_i, w_i) + \tfrac{2}{3} p(y_i + 1 | \gamma_i, w_i) + \tfrac{1}{3} p(y_i + 2 | \gamma_i, w_i) 
\end{align}

## Data models for exposure or size


# Deriving outputs

Running TMB yields a set of means $\pmb{m}$, and a precision matrix $\pmb{Q}^{-1}$, which together define the approximate joint posterior distribution of

- $\pmb{\beta}^{(m)}$ for terms with independent normal, fixed normal, multivariate normal, random walk, second-order random walk, AR1, Linear, and Linear-AR1 priors,
- $\pmb{\alpha}$ for terms with Spline and SVD priors,
- hyper-parameters for $\pmb{\beta}^{(m)}$ and $\pmb{\alpha}^{(m)}$ typically transformed to another scale, such as a log scale,
- dispersion term $\xi$, and
- seasonal effects $\pmb{\lambda}$, together with associated hyper-parameters $\tau_{\lambda}$ (on a log scale).

We use $\tilde{\pmb{\theta}}$ to denote a vector containing all these quantities.

We perform a Cholesky decomposition of $\pmb{Q}^{-1}$, to obtain $\pmb{R}$ such that
\begin{equation}
  \pmb{R}^{\top} \pmb{R} = \pmb{Q}^{-1}
\end{equation}
We store $\pmb{R}$ as part of the model object.

We draw generate values for $\tilde{\pmb{\theta}}$ by generating a vector of standard normal variates $\pmb{z}$, back-solving the equation
\begin{equation}
  \pmb{R} \pmb{v} = \pmb{z}
\end{equation}
and setting
\begin{equation}
  \tilde{\pmb{\theta}} = \pmb{v} + \pmb{m}.
\end{equation}

Next we convert any transformed hyper-parameters back to the original units, and insert values for $\pmb{\beta}^{(m)}$ for terms that have Known priors. We denote the resulting vector $\pmb{\theta}$.

Finally we draw from the distribution of $\pmb{\gamma} \mid \pmb{y}, \pmb{\theta}$ using the methods described in Section \@ref(sec:lik).


## Standardization {#sec:standard}

### The need for standardization

NOTE - We are ignoring covariates for the moment. We can probably just subtract the covariates term from $\pmb{\mu}$ and then proceed as before.

Although the sum $\pmb{\mu} = \sum_{m=0}^{M} \pmb{X}^{(m)} \pmb{\beta}^{(m)}$ is well identified by the data, the individual $\pmb{\beta}^{(m)}$ are not. For instance, adding $\Delta$ to $\pmb{\beta}^{(0)}$ and subtracting it from every term in $\pmb{\beta}^{(m)}$ for some $m > 0$ leaves the likelihood unchanged. The use of proper priors for the $\pmb{\beta}^{(m)}$ mean that posterior distributions for the $\pmb{\beta}^{(m)}$, and for related terms such as trend, cyclical, and seasonal effects, are proper. However, they can be diffuse and hard to interpret.

To help with interpretation of the $\pmb{\beta}^{(m)}$ and related terms, we standardize. We in fact appy two forms of standardization, each designed for a different purpose.

### Standardization of estimates

#### ANOVA-style standardization

The first type of standardization is applied only to the $\pmb{\beta}^{(m)}$. The aim is to partition the total variation in $\mu_i$ into variation associated with each main effect or interaction, in the style, for instance, of Section 15.6 of @gelman2014bayesian. Let $\tilde{\pmb{\beta}}^{(m)}$ be the standardized version of $\pmb{\beta}^{(m)}$. We obtain the $\tilde{\pmb{\beta}}^{(m)}$ through the following algorithm:


Set
\begin{equation}
  \pmb{r}^{(0)} = \pmb{\mu}
\end{equation}

For $m = 0, \cdots, M$:

- Set $\tilde{\pmb{\beta}}^{(m)} = (\pmb{X}^{(m)\top} \pmb{X}^{(m)})^{-1} \pmb{X}^{(m)\top} \pmb{r}^{(m)}$
- Set $\pmb{r}^{(m+1)} = \pmb{r}^{(m)} - \pmb{X}^{(m)} \tilde{\pmb{\beta}}^{(m)}$


#### Term-level standardization

Term-level standardization aims to clarify the behaviour of the terms making up the model, including trend, cyclical, and seasonal effects. The intercept term is left untouched. All other $\pmb{\beta}^{(m)}$ are centered across the "along" dimension, and each of the "by" dimensions. 


### Standardization of forecasts

#### ANOVA-style standardization

The forecasted values for the time-varying $\pmb{\beta}^{(m)}$ are shifted up or down so that they line up with the estimated values. The standardization algorithm is then applied to these shifted values.

#### Term-level standardization

The forecasted values for the time-varying $\pmb{\beta}^{(m)}$, and for SVD coefficients, and trend, cyclical, and seasonal components, are shifted up or down so that they line up with the estimated values. All terms are then centered along all dimensions other than time.

# Simulation

To generate one set of simulated values, we start with values for exposure, trials, or weights,  $\pmb{w}$, and possibly covariates $\pmb{Z}, then go through the following steps:

1. Draw values for any parameters in the priors for the $\pmb{\beta}^{(m)}$, $m = 1, \cdots, M$.
1. Conditional on the values drawn in Step 1, draw values the $\pmb{\beta}^{(m)}$, $m = 0, \cdots, M$.
1. If the model contains seasonal effects, draw the standard deviation $\kappa_m$, and then the effects $\pmb{\lambda}^{(m)}$.
1. If the model contains covariates, draw $\varphi$ and $\vartheta_p$ where necessary, draw coefficient vector $\pmb{\zeta}$.
1. Use values from steps 2--4 to form the linear predictor $\sum_{m=0}^{M} \pmb{X}^{(m)} (\pmb{\beta}^{(m)} + \pmb{\lambda}^{(m)}) + \pmb{Z} \pmb{\zeta}$.
1. Back-transform the linear predictor, to obtain vector of cell-specific parameters $\pmb{\mu}$.
1. If the model contains a dispersion parameter $\xi$, draw values from the prior for $\xi$.
1. In Poisson and binomial models, use $\pmb{\mu}$ and, if present, $\xi$ to draw $\pmb{\gamma}$.
1. In Poisson and binomial models, use $\pmb{\gamma}$ and $\pmb{w}$ to draw $\pmb{y}$; in normal models, use $\pmb{\mu}$, $\xi$, and $\pmb{w}$ to draw $\pmb{y}$.


# Replicate data

## Model

### Poisson likelihood

#### Condition on $\pmb{\gamma}$

\begin{equation}
  y_i^{\text{rep}} \sim \text{Poisson}(\gamma_i w_i)
\end{equation}

#### Condition on $(\pmb{\mu}, \xi)$

\begin{align}
  y_i^{\text{rep}} & \sim \text{Poisson}(\gamma_i^{\text{rep}} w_i) \\
  \gamma_i^{\text{rep}} & \sim \text{Gamma}(\xi^{-1}, (\xi \mu_i)^{-1})
\end{align}
which is equivalent to
\begin{equation}
  y_i^{\text{rep}} \sim \text{NegBinom}\left(\xi^{-1}, (1 + \mu_i w_i \xi)^{-1}\right)
\end{equation}

### Binomial likelihood

#### Condition on $\pmb{\gamma}$

\begin{equation}
  y_i^{\text{rep}} \sim \text{Binomial}(w_i, \gamma_i)
\end{equation}

#### Condition on $(\pmb{\mu}, \xi)$

\begin{align}
  y_i^{\text{rep}} & \sim \text{Binomial}(w_im \gamma_i^{\text{rep}}) \\
  \gamma_i^{\text{rep}} & \sim \text{Beta}\left(\xi^{-1} \mu_i, \xi^{-1}(1 - \mu_i)\right)
\end{align}

### Normal likelihood

\begin{equation}
  y_i^{\text{rep}} \sim \text{N}(\gamma_i, \xi^2 / w_i)
\end{equation}

### Data models for outcomes

If the overall model includes a data model for the outcome, then a further set of draws is made, deriving values for the observed outcomes, given values for the true outcomes.

## Code

```
replicate_data(x, condition_on = c("fitted", "expected"), n = 20)
```

# Appendices

## Definitions {#app:defn}


| Quantity | Definition                                          |
|:---------|:----------------------------------------------------|
| $i$      | Index for cell, $i = 1, \cdots, n$.
| $y_i$    | Value for outcome variable.              |
| $w_i$    | Exposure, number of trials, or weight. |
| $\gamma_i$ | Super-population rate, probability, or mean. |
| $\mu_i$ | Cell-specific mean. |
| $\xi$   | Dispersion parameter. |
| $g()$ | Log, logit, or identity function. |
| $m$ | Index for intercept, main effect, or interaction. $m = 0, \cdots, M$. |
| $j$ | Index for element of a main effect or interaction. |
| $u$ | Index for combination of 'by' variables for an interaction. $u = 1, \cdots U_m$. $U_m V_m = J_m$ |
| $v$ | Index for the 'along' dimension of an interaction. $v = 1, \cdots V_m$. $U_m V_m = J_m$ |
| $\beta^{(0)}$ | Intercept. |
| $\pmb{\beta}^{(m)}$ | Main effect or interaction. $m = 1, \cdots, M$.  |
| $\beta_j^{(m)}$ | $j$th element of $\pmb{\beta}^{(m)}$. $j = 1, \cdots, J_m$. |
| $\pmb{X}^{(m)}$ | Matrix mapping $\pmb{\beta}^{(m)}$ to $\pmb{y}$. |
| $\pmb{Z}$ | Matrix of covariates. |
| $\pmb{\zeta}$ | Parameter vector for covariates $\pmb{Z}^{(m)}$. |
| $A_0$ | Scale parameter in prior for intercept $\beta^{(0)}$. |
| $\tau_m$ | Standard deviation parameter for main effect or interaction. |
| $A_{\tau}^{(m)}$ | Scale parameter in prior for $\tau_m$. |
| $\pmb{\alpha}^{(m)}$ | Parameter vector for P-spline and SVD priors. |
| $\alpha_k^{(m)}$ | $k$th element of $\pmb{\alpha}^{(m)}$. $k = 1, \cdots, K_m$. |
| $\pmb{V}^{(m)}$ | Covariance matrix for multivariate normal prior. |
| $h_j^{(m)}$ | Linear covariate |
| $\eta^{(m)}$ | Parameter specific to main effect or interaction $\pmb{\beta}^{(m)}$. |
| $\eta_u^{(m)}$ | Parameter specific to $u$th combination of 'by' variables in interaction $\pmb{\beta}^{(m)}$. |
| $A_{\eta}^{(m)}$ | Standard deviation in normal prior for $\eta_m$. |
| $\omega_m$ | Standard deviation of parameter $\eta_c$ in multivariate priors. |
| $\phi_m$ | Correlation coefficient in AR1 densities. |
| $a_{0m}$, $a_{1m}$ | Minimum and maximum values for $\phi_m$. |
| $\pmb{B}^{(m)}$ | B-spline matrix in P-spline prior. |
| $\pmb{b}_k^{(m)}$ | B-spline. $k = 1, \cdots, K_m$. |
| $\pmb{F}^{(m)}$ | Matrix in SVD prior. |
| $\pmb{g}^{(m)}$ | Offset in SVD prior. |
| $\pmb{\beta}_{\text{trend}}$ | Trend effect. |
| $\pmb{\beta}_{\text{cyc}}$ | Cyclical effect. |
| $\pmb{\beta}_{\text{seas}}$ | Seasonal effect. |
| $\varphi$ | Global shrinkage parameter in shrinkage prior. |
| $A_{\varphi}$ | Scale term in prior for $\varphi$. |
| $\vartheta_p$ | Local shrinkage parameter in shrinkage prior. |
| $p_0$ | Expected number of non-zero coefficients in $\pmb{\zeta}$. |
| $\hat{\sigma}$ | Empirical scale estimate in prior for $\varphi$. |
| $\pi$ | Vector of hyper-parameters |

 
## SVD prior for age  {#app:svd}

Let $\pmb{A}$ be a matrix of age-specific estimates from an international database, transformed to take values in the range $(-\infty, \infty)$. Each column of $\pmb{A}$ represents one set of age-specific estimates, such as log mortality rates in Japan in 2010, or logit labour participation rates in Germany in 1980.

Let $\pmb{U}$, $\pmb{D}$, $\pmb{V}$ be the matrices from a singular value decomposition of $\pmb{A}$, where we have retained the first $K$ components. Then
\begin{equation}
  \pmb{A} \approx \pmb{U} \pmb{D} \pmb{V}. (\#eq:svd1)
\end{equation}

Let $m_k$ and $s_k$ be the mean and sample standard deviation of the elements of the $k$th row of $\pmb{V}$, with $\pmb{m} = (m_1, \cdots, m_k)^{\top}$ and $\pmb{s} = (s_1, \cdots, s_k)^{\top}$. Then
\begin{equation}
  \tilde{\pmb{V}} = (\text{diag}(\pmb{s}))^{-1} (\pmb{V} - \pmb{m} \pmb{1}^{\top})
\end{equation}
is a standardized version of $\pmb{V}$.

We can rewrite \@ref(eq:svd1) as
\begin{align}
  \pmb{A} & \approx \pmb{U} \pmb{D} (\text{diag}(\pmb{s}) \tilde{\pmb{V}} + \pmb{m} \pmb{1}^{\top}) \\
  & = \pmb{F} \tilde{\pmb{V}} + \pmb{g} \pmb{1}^{\top}, (\#eq:svd2)
\end{align}
where $\pmb{F} = \pmb{U} \pmb{D} \text{diag}(\pmb{s})$ and $\pmb{g} = \pmb{U} \pmb{D} \pmb{m}$.

Let $\tilde{\pmb{v}}_l$ be a randomly-selected column from $\tilde{\pmb{V}}$. From the construction of $\tilde{\pmb{V}}$, and the orthogonality of the rows of $\pmb{V}$, we have $\text{E}[\tilde{\pmb{v}}_l] = \pmb{0}$ and $\text{var}[\tilde{\pmb{v}}_l] = \pmb{I}$. This implies that if $\pmb{z}$ is a vector of standard normal variables, then
\begin{equation}
  \pmb{F} \pmb{z} + \pmb{g}
\end{equation}
should look approximately like a randomly-selected column from the original data matrix $\pmb{A}$.


# References