---
title: "Power-scaling sensitivity analysis"
vignette: >
  %\VignetteIndexEntry{Power-scaling sensitivity analysis}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}

ggplot2::theme_set(bayesplot::theme_default(base_family = "sans"))
```

# Introduction

priorsense is a package for prior diagnostics in Bayesian models. It
currently implements power-scaling sensitivity analysis but may be
extended in the future to include other diagnostics.

# Power-scaling sensitivity analysis

Power-scaling sensitivity analysis tries to determine how small
changes to the prior or likelihood affect the posterior. This is done
by power-scaling the prior or likelihood by raising it to some $\alpha > 0$.

- For prior power-scaling: $p(\theta \mid y) \propto p(\theta)^\alpha
  p(y \mid \theta)$
- For likelihood power-scaling: $p(\theta \mid y) \propto
  p(\theta) p(y \mid \theta)^\alpha$

In priorsense, this is done in a computationally efficient manner
using Pareto-smoothed importance sampling (and optionally importance
weighted moment matching) to estimate properties of these perturbed
posteriors. Sensitivity can then be quantified by considering how much
the perturbed posteriors differ from the base posterior.

## Example power-scaling sensitivity analysis
```{r}
#| warning: false
library(priorsense)
library(rstan)
```

Consider the following model (available via
`example_powerscale_model("univariate_normal")`:

$$y \sim \text{normal}(\mu, \sigma)$$
$$\mu \sim \text{normal}(0, 1)$$
$$\sigma \sim \text{normal}^+(0, 2.5)$$

We have 100 data points for $y$
We first fit the model using Stan:

```stan
data {
  int<lower=1> N;
  array[N] real y;
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  // priors
  target += normal_lpdf(mu | 0, 1);
  target += normal_lpdf(sigma | 0, 2.5);
  // likelihood
  target += normal_lpdf(y | mu, sigma);
}
generated quantities {
  vector[N] log_lik;
  real lprior;
  // log likelihood
  for (n in 1:N) log_lik[n] =  normal_lpdf(y[n] | mu, sigma);
  // joint log prior
  lprior = normal_lpdf(mu | 0, 1) + normal_lpdf(sigma | 0, 2.5);
}
```

```{r}
#| warning: false
#| eval: false
#| message: false
normal_model <- example_powerscale_model("univariate_normal")

fit <- stan(
  model_code = normal_model$model_code,
  data = normal_model$data,
  refresh = FALSE,
  seed = 123
)

```

```{r}
#| echo: false
#| warning: false
#| message: false
normal_model <- example_powerscale_model("univariate_normal")
fit <- normal_model$draws

```

Next, we check the sensitivity of the prior and likelihood to
power-scaling. The sensitivity values shown below are an indication of
how much the posterior changes with respect to power-scaling. Larger
values indicate more sensitivity. By default these values are derived
from the gradient of the Cumulative Jensen-Shannon distance between
the base posterior and posteriors resulting from power-scaling.

```{r}
#| message: false
#| warning: false
powerscale_sensitivity(fit, variable = c("mu", "sigma"))
```

Here, we see that the pattern of sensitivity indicates that there is
prior-data conflict for $\mu$. We follow up with visualisation.

We first create a `powerscaled_sequence` object, which contains
estimates of posteriors for a range of power-scaling amounts.

There are three plots currently available:

- Kernel density estimates:
```{r}
#| message: false
#| warning: false
#| fig-width: 6
#| fig-height: 4
powerscale_plot_dens(fit, variable = "mu", facet_rows = "variable")
```

- Empirical cumulative distribution functions:
```{r}
#| message: false
#| warning: false
#| fig-width: 6
#| fig-height: 4
powerscale_plot_ecdf(fit, variable = "mu", facet_rows = "variable")
```

- Quantities:
```{r}
#| message: false
#| warning: false
#| fig-width: 12
#| fig-height: 4
powerscale_plot_quantities(fit, variable = "mu")
```

As can be seen in the plots, power-scaling the prior and likelihood
have opposite direction effects on the posterior. This is further
evidence of prior-data conflict.

Indeed, if we inspect the raw data, we see that the prior on $\mu$,
$\text{normal}(0, 1)$ does not match well with the mean of the data,
whereas the prior on $\sigma$, $\text{normal}^+(0, 2.5)$ is
reasonable:

```{r}
mean(normal_model$data$y)
sd(normal_model$data$y)
```