---
title: "Empirical Application in Gunsilius (2023)"
author: "David Van Dijcke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Empirical Application in Gunsilius (2023)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: ../inst/REFERENCES.bib
editor_options:
  markdown:
    wrap: 72
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    comment = "#",
    echo=FALSE,
  error = FALSE,
  tidy = FALSE,
  cache = FALSE,
  collapse = TRUE,
  eval=FALSE, # dont rerun vignette when building package
  out.width = '100%',
  dpi = 144
)
```

```{r, include=FALSE}

library(pracma)
library(parallel)
library(stats)
library(maps)
library(data.table)
library(CVXR)
library(DiSCos)
library(ggplot2)
```

# Introduction

This vignette demonstrates how to use the DiSCo package by way of the
empirical application in @gunsilius2023distributional, which is based on
@dube2019minimum. We illustrate the use of the two main functions: 1)
`DiSCo`, which estimates the raw distributional counterfactuals,
computes confidence intervals, and optionally performs a permutation
test; and 2) `DiSCoTEA`, the "Treatment Effect Aggregator", which takes
in the distributional counterfactuals and computes aggregate treatment
effects using a user-specified aggregation statistic.

# Distributional Synthetic Controls

We briefly review the main idea behind Distributional Synthetic
Controls. Denote $Y_{jt,N}$ the outcome of group $j$ in time period $t$
in the absence of an intervention. Also denote $Y_{jt,I}$ the outcome in
the presence of an intervention at time $t > T_0$. Denote the quantile
function, $$
F^{-1}(q):=\inf _{y \in \mathbb{R}}\{F(y) \geq q\}, \quad q \in(0,1),
$$ where $F(y)$ is the corresponding cumulative distribution function.
One unit $j=1$ has received treatment, while the other units
$j=2,\ldots,J$ have not. Then the goal is to estimate the counterfactual
quantile function $F_{Y_{1 t}, N}^{-1}$ of the treated unit had it not
received treatment by an optimally weighted average of the control
units' quantile functions, $$
F_{Y_{1 t}, N}^{-1}(q)=\sum_{j=2}^{J+1} \lambda_j^* F_{Y_{j t}}^{-1}(q) \quad \text { for all } q \in(0,1)
$$\
In practice, we do this by solving the following optimization problem:
$$
\vec{\lambda}_t^*=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int_0^1\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q
$$which gives an optimal weight for each unit-period combination. This
problem can be solved by a simple weighted regression, which is
implemented in the `DiSCo_weights_reg` function. To obtain the overall
optimal weights $\vec{\lambda}^*$, we take the average of the optimal
weights across all periods.

# Basic Usage

To get coding, we load the data from @dube2019minimum, which is
available in the package.

```{r}
data("dube")
head(dube)
```

To learn more about the data, just type `?dube` in the console. We have
already renamed the outcome, id, and time variables to `y_col`,
`id_col`, and `time_col`, respectively, which is required before passing
the dataframe to the DiSCo command. We also need to set the two
following parameters:

```{r}
id_col.target <- 2
t0 <- 2003
```

which indicate the id of the treated unit and the time period of the
intervention, respectively. We can now run the DiSCo command:

```{r, fig.width=8,fig.height=5}
df <- copy(dube)
disco <- DiSCo(df, id_col.target, t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = TRUE, simplex=TRUE, seed=1, q_max=0.9)
```

where we have chosen a grid (`G`) of 1000 quantiles and opted for
parallel computation with 5 cores to speed up the permutation test and
confidence intervals (`CI` is set to `TRUE`), which we calculate using
1000 resamples (`boots`). We also set the `seed` explicitly in the
function call, which ensures reproducibility across the parallel cores.
We followed the "classical" approach of @abadie2010synthetic by
restricting the weights to be between 0 and 1 (`simplex=TRUE`). Finally,
we restricted the data to the 0-90th quantile to account for the fact
that the CPS data used in @dube2019minimum is imputed for the top income
quantiles (more detail in the section below).

The returned `disco` object contains a host of information produced by
the command. Typing `?DiSCoT` in the console pulls up the help files
which lay out the precise structure of the returned object. For now, we
will just have a look at the top 10 estimated weights,

```{r}
# retrieve the weights
weights <- disco$weights

# retrieve the control unit IDs 
controls <- disco$control_ids

# store in a dataframe
weights_df <- data.frame(weights = weights, fips = controls)

# merge with state fips codes (built into the maps package)
state.fips <- as.data.table(maps::state.fips)
state.fips <- state.fips[!duplicated(state.fips$abb), c("fips", "abb")]
weights_df <- merge(weights_df, state.fips, by = "fips")

setorder(weights_df, -weights)

print(weights_df[1:10,])

```

When we ran the `DiSCo` command, we set `permutation` to `TRUE`, which
runs the permutation test described in the paper (see `?DiSCo_per` for
more details). This will allow us to inspect the permutation inference
results below. Already, by setting `graph` to `TRUE`, the function
displayed a plot of the full distribution of permutation tests. The
black solid line shows the fit of the "true" synthetic control. The fact
that it does not diverge stronger than the other lines in gray after
treatment suggests that the treatment had no effect.

For context, the y-axis is the squared Wasserstein distance between the
counterfactual and observed quantile functions, $$
d_{t t}^2:=\int_0^1\left|F_{Y_{u t, N}}^{-1}(q)-F_{Y_{u t}}^{-1}(q)\right|^2 d q
$$and, as in @abadie2010synthetic we take the ratio of post- to
pre-intervention Wasserstein distance to account for variation in the
pre-treatment fit across placebo tests, $$
r_j=\frac{R_j\left(T_0+1, T\right)}{R_j\left(1, T_0\right)}
$$ and calculate the p-value for the permutation test as, $$
p=\frac{1}{J+1} \sum_{j=1}^{J+1} H\left(r_j-r_1\right),
$$where $H(x)$ is the Heaviside function which is 1 if $x\geq 0$ and 0
otherwise. This p-value gives the probability of observing a placebo
test with a larger ratio than the true treatment effect. It can be
retrieved by calling

```{r}
summary(disco$perm)
```

The p-value is larger than 0.05, which confirms the visual result from
the plot above, while accounting for potential differences in
pre-treatment fit across placebo units.

Finally, we can use the DiSCo Treatment Effect Aggregator (`DiSCoTEA`)
function to aggregate the resulting counterfactual quantile functions
into various treatment effect measures. For example, we can calculate
the difference between the counterfactual and observed quantile
functions and cumulative distribution functions (CDF) as follows.

```{r, fig.width=5,fig.height=8, fig.align='center'}
discot <- DiSCoTEA(disco,  agg="quantileDiff", graph=TRUE)
summary(discot)
```

```{r, fig.width=5,fig.height=8, fig.align='center'}
discot <- DiSCoTEA(disco,  agg="cdfDiff", graph=TRUE, ylim=c(-0.05, 0.05))
summary(discot)
```

Calling `summary` on the returned object prints a table summarizing the
effects across the distribution. You can choose the intervals of
quantiles over which it aggregates using the `samples` parameter. If you
calculated the permutation test and confidence intervals in the `DiSCo`
function, these are reported as well.

Setting `graph` to `TRUE` also prints a plot of the distribution
differences over time You can focus on specific years using the `t_plot`
parameter. You can use the other parameters of the `DiSCOTEA` function
to adjust the basic appearance of the plots, or directly alter the
ggplot object that is stored in the returned `DiSCoT` object.

Looking at the plot of the quantile and CDF differences, we can see two
things: 1) the pre-treatment fit for the years 1998-2002 is decent but
not perfect; 2) there do not appear to be any notable effects of the
minimum wage inrease, asides from fluctuations in the function
differences that are similar to those pre-treatment.

# Robustness tests

The package offers various ways to further test suspected effects. For
example, we can focus on a specific part of the distribution, and
construct a separate synthetic control for it. Mathematically, this
comes down to,

$$
\vec{\lambda}_t^*=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int_{\text{q_min}}^{\text{q_max}}\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q,
$$

where $\text{q_min} < \text{q_max}$ are the bounds of the quantile range
we're interested in. For example, following @dube2019minimum, we can
focus on the lower end of the distribution, up until observations that
earn 3.5 times the poverty income threshold. This corresponds to around
the 0.65th quantile in our data:

```{r}
stats::ecdf(disco$results.periods$`2000`$target$quantiles)(3.5)
```

We can simply run the following code,

```{r}
disco <- DiSCo(dube, id_col.target=id_col.target, t0=t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = FALSE, q_min = 0, q_max=0.65, seed=1, simplex=TRUE)
```

One could also try to deal with irregularity in the estimated quantile
functions is by using the `qmethod` option in the `DiSCo` command, which
allows for the use of alternative quantile estimation methods that can
account for non-smoothness and extreme values. As above, we plot the
results using the `DiScoTEA` function:

```{r, fig.width=5,fig.height=8, fig.align='center'}
discot <- DiSCoTEA(disco, agg="quantileDiff", graph=TRUE)
summary(discot) 

```

```{r, fig.width=5,fig.height=8, fig.align='center'}
discot <- DiSCoTEA(disco, agg="cdfDiff", graph=TRUE, ylim=c(-0.05,0.05))
summary(discot) 
```

Again, we do not observe any subtantial changes in the quantile
functions or CDFs. Also, the p-value of the permutation test is now
larger, losing the marginal significance it had before. Overall, we
cannot reject the null hypothesis of there being no effect of the
treatment.

# Conclusion

In this vignette, we have demonstrated the DiSCo package using the data
from the empirical application in @gunsilius2023distributional. We
estimated the difference in between the quantile functions of the
distributional synthetic control constructed with the `DiSco` command
and the observed quantile function. To test for the presence of
treatment effects, we inspected the pre-treatment distributional fit,
carried out a permutation test, and re-estimated the synthetic control
for a restricted quantile range; all these robustness tests are
available in the DiSCo package, together with handy graphing and
aggregation tools.