---
title: "Probabilistic Reconciliation via Conditioning with `bayesRecon`"
author: "Nicolò Rubattu, Giorgio Corani, Dario Azzimonti, Lorenzo Zambon"
date: "2023-11-26"
lang: "en"
output: html_vignette
bibliography: references.bib
cite:
- '@zambon2022efficient'
- '@zambon2023properties'
- '@corani2023probabilistic'
- '@corani2021probabilistic'
vignette: >
  %\VignetteIndexEntry{Probabilistic Reconciliation via Conditioning with `bayesRecon`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=TRUE ### !!!! set to FALSE here to render only the text !!!!
)
set.seed(42)
```

```{r klippy, echo=FALSE, include=TRUE, eval=FALSE}
klippy::klippy(position = c('top', 'right'), tooltip_message = 'Copy', tooltip_success = 'Done', color="black")
```

# Introduction

This vignette shows how to perform  *probabilistic reconciliation* with
the `bayesRecon` package. We provide three examples:

1. *Temporal hierarchy for a count time series*: we build a temporal hierarchy over a count time series,  produce the base forecasts using `glarma` and  reconcile them via Bottom-Up Importance Sampling (BUIS).

2.  *Temporal hierarchy for a smooth time series*: we build a temporal hierarchy over a smooth time series,  compute the base forecasts using `ets` and  we reconcile them in closed form using Gaussian reconciliation. The covariance matrix is diagonal.

3.   *Hierarchical of smooth time series*: this is an example of a cross-sectional hierarchy. We generate the base forecasts using `ets` and we reconcile them via Gaussian reconciliation.
The covariance matrix is full and estimated via shrinkage.



# Installation

The package, available at this 
[CRAN page](https://cran.r-project.org/package=bayesRecon), can be installed and loaded with the usual commands:
```{r install, eval=FALSE}
install.packages('bayesRecon', dependencies = TRUE)
```

Load the package:

```{r load}
library(bayesRecon)
```

# Temporal hierarchy over a count time series
We select a monthly time series of counts from  the *carparts* dataset, available from
the expsmooth package [@expsmooth_pkg].
The data set contains time series of sales of cars part from Jan. 1998 to Mar. 2002. 
For this example we select time series #2655, which we make  available  as `bayesRecon::carparts_example`.

This time series  has a skewed distribution of  values. 
```{r carpart-plot, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 1**: Carpart - monthly car part sales.", fig.dim = c(6, 3)}
layout(mat = matrix(c(1, 2), nrow = 1, ncol = 2), widths = c(2, 1))
plot(carparts_example, xlab = "Time", ylab = "Car part sales", main = NULL)
hist(carparts_example, xlab = "Car part sales", main = NULL)
```
<br><br>
We divide the time series into train and test; the test set contains  the last 12 months.
```{r train-test}
train <- window(carparts_example, end = c(2001, 3))
test  <- window(carparts_example, start = c(2001, 4))
```

<!-- In order to do hierarchical forecasting: -->

<!-- 1.    we build a temporal hierarchy; -->
<!-- 2.    we compute the *base forecasts*) at each temporal scale. -->

We build the temporal hierarchy using the `temporal aggregation` function.
We specify the aggregation levels using the 
`agg_levels` argument; in this case they are
*2-Monthly*, *Quarterly*, *4-Monthly*, *Biannual*, and *Annual*.

``` {r temp-agg}
train.agg <- bayesRecon::temporal_aggregation(train, agg_levels = c(2, 3, 4, 6, 12))
levels <- c("Annual", "Biannual", "4-Monthly", "Quarterly", "2-Monthly", "Monthly")
names(train.agg) <- levels
```

The function returns a list of aggregated time series, ordered from the most aggregated (top of the hierarchy) to the most disagreggated (bottom of the hierarchy). We  plot them below.
``` {r temp-agg-plot, dpi=300, fig.show="hold", out.width="100%", out.heigth="100%", fig.align='center', fig.cap="**Figure 2**: The aggregated time series of the temporal hierarchy.", fig.dim=c(6,3.5)}
par(mfrow = c(2, 3), mai = c(0.6, 0.6, 0.5, 0.5))
for (l in levels) {
  plot(train.agg[[l]], xlab = "Time", ylab = "Car part sales", main = l)
}
```
<br><br>
We  compute the  *base forecasts* using the package  [`glarma`](https://cran.r-project.org/package=glarma),
a package specific for forecasting count time series.
We forecast 12 steps ahead at the monthly level,  4 steps ahead  at the quarterly level, etc. by iterating over the levels of the  hierarchy,
At each level, we fit a `glarma` model
with Poisson predictive distribution and we forecast;
each forecast is constituted by 20000  integer samples.


Eventually we collect the samples of the  28 predictive distributions (one at the  *Annual* level, two at the *Biannual* level, etc.) in a list.
The  code below takes about 30 seconds on a standard computer.

``` {r hier-fore, cache=TRUE}
# install.packages("glarma", dependencies = TRUE)
#library(glarma)

fc.samples <- list()
D <- 20000
fc.count <- 1

# iterating over the temporal aggregation levels
for (l in seq_along(train.agg)) {
  f.level <- frequency(train.agg[[l]])
  print(paste("Forecasting at ", levels[l], "...", sep = ""))
  # fit an independent model for each aggregation level
  model <- glarma::glarma(
    train.agg[[l]],
    phiLags = if (f.level == 1) 1 else 1:(min(6, f.level - 1)),
    thetaLags = if (f.level == 1) NULL else f.level,
    X = cbind(intercept = rep(1, length(train.agg[[l]]))),
    offset = cbind(intercept = rep(0, length(train.agg[[l]]))),
    type = "Poi"
  )
  # forecast 1 year ahead
  h <- f.level
  tmp <- matrix(data = NA, nrow = h, ncol = D)
  for (s in (1:D)) {
    # each call to 'forecast.glarma' returns a simulation path
    tmp[, s] <- glarma::forecast(
      model,
      n.ahead = h,
      newdata = cbind(intercept = rep(1, h)),
      newoffset = rep(0, h)
    )$Y
  }
  # collect the forecasted samples
  for (i in 1:h) {
    fc.samples[[fc.count]] <- tmp[i, ]
    fc.count <- fc.count + 1
  }
}

```


Reconciliation requires the aggregation matrix $\mathbf{A}$, which we obtain using the function `get_reconc_matrices`. 
It requires: 

*   the aggregation factors of the hierarchy, which in this example are $\{2, 3, 4, 6, 12\}$;
*   the length of the forecasting horizon at the  bottom level, which is 12 in this example.

``` {r aggregationMatrix}
recon.matrices <- bayesRecon::get_reconc_matrices(agg_levels = c(2, 3, 4, 6, 12), h = 12)
# Aggregation matrix
A <- recon.matrices$A 
```

To reconcile using Bottom-Up Important Sampling (BUIS) we
we use the function `reconc_BUIS`, passing to it
the $\mathbf{A}$ matrix, the *base forecasts*, the  type  of the base forecasts (`in_type`="samples") and whether the samples are discrete or integer (`distr` = "discrete").


``` {r reconc}
recon.res <- bayesRecon::reconc_BUIS(
  A,
  base_forecasts = fc.samples,
  in_type = "samples",
  distr = "discrete",
  seed = 42
)
```

Here we obtain samples from the reconciled forecast distribution.
```{r res}
reconciled_samples <- recon.res$reconciled_samples
dim(reconciled_samples)

```

We now compute the Mean Absolute Error (MAE) and the Continuous Ranked Probability Score (CRPS) for the bottom (i.e., *monthly*) time series.  For computing CRPS, we use the package [`scoringRules`](https://cran.r-project.org/package=scoringRules).

```{r metrics}
# install.packages("scoringRules", dependencies = TRUE)
library(scoringRules)

ae.fc <- list()
ae.reconc <- list()
crps.fc <- list()
crps.reconc <- list()
for (h in 1:length(test)) {
  y.hat_ <- median(fc.samples[[nrow(A) + h]])
  y.reconc_ <- median(recon.res$bottom_reconciled_samples[, h])
  # Compute Absolute Errors
  ae.fc[[h]] <- abs(test[h] - y.hat_)
  ae.reconc[[h]] <- abs(test[h] - y.reconc_)
  # Compute Continuous Ranked Probability Score (CRPS)
  crps.fc[[h]] <-
    scoringRules::crps_sample(y = test[h], dat = fc.samples[[nrow(A) + h]])
  crps.reconc[[h]] <-
    scoringRules::crps_sample(y = test[h], dat = recon.res$bottom_reconciled_samples[, h])
}

mae.fc <- mean(unlist(ae.fc))
mae.reconc <- mean(unlist(ae.reconc))
crps.fc <- mean(unlist(crps.fc))
crps.reconc <- mean(unlist(crps.reconc))
metrics <- data.frame(
  row.names = c("MAE", "CRPS"),
  base.forecasts = c(mae.fc, crps.fc),
  reconciled.forecasts = c(mae.reconc, crps.reconc)
)
metrics

```


# Temporal hierarchy over a smooth time series

In this second example, we select a smooth monthly time series (N1485) from the M3 forecasting competition [@makridakis2000m3]. The data set is available in
the Mcomp package [@mcomp_pkg] and we make available this time series as `bayesRecon::m3_example`.


```{r m3-plot, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 3**: M3 - N1485 time series.", fig.dim = c(6, 3)}
plot(M3_example$train, xlab = "Time", ylab = "y", main = "N1485")
```
<br>
We build the temporal hierarchy using the `temporal_aggregation` function. 

Without specifying  `agg_levels`, the function  generates by default all the feasible aggregation: 2-Monthly, Quarterly, 4-Monthly, Biannual, and Annual.

```{r m3-agg} 
train.agg <- bayesRecon::temporal_aggregation(M3_example$train)
levels <- c("Annual", "Biannual", "4-Monthly", "Quarterly", "2-Monthly", "Monthly")
names(train.agg) <- levels
```


We generate the base forecasts using `ets` from the   [forecast](https://cran.r-project.org/package=forecast) package [@pkg_forecast].
At every level we predict 18 months ahead.
All the  predictive distributions are Gaussian.
```{r m3-fore}
# install.packages("forecast", dependencies = TRUE)
library(forecast)

H <- length(M3_example$test)
H

fc <- list()
level.idx <- 1
fc.idx <- 1
for (level in train.agg) {
  level.name <- names(train.agg)[level.idx]
  # fit an ETS model for each temporal level
  model <- ets(level)
  # generate forecasts for each level within 18 months
  h <- floor(H / (12 / frequency(level)))
  print(paste("Forecasting at ", level.name, ", h=", h, "...", sep = ""))
  level.fc <- forecast(model, h = h)
  # save mean and sd of the gaussian predictive distribution
  for (i in 1:h) {
    fc[[fc.idx]] <- list(mean = level.fc$mean[[i]],
                         sd = (level.fc$upper[, "95%"][[i]] - level.fc$mean[[i]]) / qnorm(0.975))
    fc.idx <- fc.idx + 1
  }
  level.idx <- level.idx + 1
}
```

Using the function `get_reconc_matrices`,  we get  matrix $\mathbf{A}$.

```{r m3-rmat, dpi=300, out.width = '70%', fig.align='center', fig.cap="**Figure 4**: M3 - The aggregation matrix A (red=1, yellow=0).", fig.dim = c(8, 8)}
rmat <- get_reconc_matrices(agg_levels = c(2, 3, 4, 6, 12), h = 18)

par(mai = c(1,1,0.5,0.5))
image(1:ncol(rmat$A), 1:nrow(rmat$A), 
      t(apply(t(rmat$A),1,rev)), 
      xaxt='n', yaxt='n', ylab = "", xlab = levels[6])
axis(1, at=1:ncol(rmat$A), label=1:ncol(rmat$A), las=1)
axis(2, at=c(23,22,19,15,9), label=levels[1:5], las=2)
```
<br>
The function `reconc_gaussian`  implements the exact Gaussian reconciliation.
We also run `reconc_BUIS`, to check the consistency 
between the two approaches.


```{r m3-reco} 
recon.gauss <- bayesRecon::reconc_gaussian(
  A = rmat$A,
  base_forecasts.mu = sapply(fc, "[[", 1),
  base_forecasts.Sigma = diag(sapply(fc, "[[", 2)) ^ 2
)

reconc.buis <- bayesRecon::reconc_BUIS(
  A = rmat$A,
  base_forecasts = fc,
  in_type = "params",
  distr = "gaussian",
  num_samples = 20000,
  seed = 42
)

# check that the algorithms return consistent results
round(rbind(
  c(rmat$S %*% recon.gauss$bottom_reconciled_mean),
  rowMeans(reconc.buis$reconciled_samples)
))
```

We now compare  *base forecasts* and  *reconciled forecasts*:

```{r m3-plotfore, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 5**: M3 - Base and reconciled forecasts. The black line shows the actual data (dashed in the test). The orange line is the  mean of the base forecasts, the blu line is the reconciled mean. We also show the 95% prediction intervals.", fig.dim = c(6, 4)}
yhat.mu <- tail(sapply(fc, "[[", 1), 18)
yhat.sigma <- tail(sapply(fc, "[[", 2), 18)
yhat.hi95 <- qnorm(0.975, mean = yhat.mu, sd = yhat.sigma)
yhat.lo95 <- qnorm(0.025, mean = yhat.mu, sd = yhat.sigma)
yreconc.mu <- rowMeans(reconc.buis$bottom_reconciled_samples)
yreconc.hi95 <- apply(reconc.buis$bottom_reconciled_samples, 1, 
                      function(x) quantile(x, 0.975))
yreconc.lo95 <- apply(reconc.buis$bottom_reconciled_samples, 1, 
                      function(x) quantile(x, 0.025))

ylim_ <- c(min(M3_example$train, M3_example$test, yhat.lo95, yreconc.lo95) - 1, 
           max(M3_example$train, M3_example$test, yhat.hi95, yreconc.hi95) + 1)

plot(M3_example$train, xlim = c(1990, 1995.6), ylim = ylim_, 
     ylab = "y", main = "N1485 Forecasts")
lines(M3_example$test, lty = "dashed")
lines(ts(yhat.mu, start = start(M3_example$test), frequency = 12), 
      col = "coral", lwd = 2)
lines(ts(yreconc.mu, start = start(M3_example$test), frequency = 12), 
      col = "blue2", lwd = 2)
xtest <- time(M3_example$test)
polygon(c(xtest, rev(xtest)), c(yhat.mu, rev(yhat.hi95)), 
        col = "#FF7F5066", border = "#FF7F5066")
polygon(c(xtest, rev(xtest)), c(yhat.mu, rev(yhat.lo95)), 
        col = "#FF7F5066", border = "#FF7F5066")
polygon(c(xtest, rev(xtest)), c(yreconc.mu, rev(yreconc.hi95)), 
        col = "#0000EE4D", border = "#0000EE4D")
polygon(c(xtest, rev(xtest)), c(yreconc.mu, rev(yreconc.lo95)), 
        col = "#0000EE4D", border = "#0000EE4D")
```



#   Gaussian reconciliation of a cross-sectional hierarchy

In this example, we consider the hierarchical time series *infantgts*, which is available 
from the `hts` package [@hts_pkg]
We make it available also in our package as `bayesRecon::infantMortality`.

It contains counts of infant mortality (deaths) in Australia, disaggregated by state and sex (male and female). 

<!-- State codes refer to: New South Wales (NSW), Victoria (VIC), Queensland (QLD), South Australia (SA), Western Australia (WA), Northern Territory (NT), Australian Capital Territory (ACT), and Tasmania (TAS). -->

We forecast one year ahead using `auto.arima`  from the [`forecast` ](https://cran.r-project.org/package=forecast) package. 
We collect the residuals, which we will later use to compute the covariance matrix. 

```{r infants-forecasts}
# install.packages("forecast", dependencies = TRUE)
library(forecast)

fc <- list()
residuals <- matrix(NA,
                    nrow = length(infantMortality$total),
                    ncol = length(infantMortality))
fc.idx <- 1
for (s in infantMortality) {
  s.name <- names(infantMortality)[fc.idx]
  print(paste("Forecasting at ", s.name, "...", sep = ""))
  # fit an auto.arima model and forecast with h=1
  model <- auto.arima(s)
  s.fc <- forecast(model, h = 1)
  # save mean and sd of the gaussian predictive distribution
  fc[[s.name]] <- c(s.fc$mean,
                    (s.fc$upper[, "95%"][[1]] - s.fc$mean) / qnorm(0.975))
  residuals[, fc.idx] <- s.fc$residuals
  fc.idx <- fc.idx + 1
}
```

Now we build the $\mathbf{A}$ matrix.

```{r infants-s, dpi=300, out.width = '70%', fig.align='center', fig.cap="**Figure 6**: Infants mortality - The aggregation matrix A (red=1, yellow=0).", fig.dim = c(8, 8)}
# we have 16 bottom time series, and 11 upper time series
A <- matrix(data = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
                     1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                     0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
                     0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,
                     0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,
                     0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,
                     0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,
                     0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,
                     0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,
                     1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
                     0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1), byrow=TRUE, ncol = 16)

# plot of A
par(mai = c(1.5,1,0.5,0.5))
image(1:ncol(A), 1:nrow(A), 
      t(apply(t(A),1,rev)), 
      xaxt='n', yaxt='n', ann=FALSE)
axis(1, at=1:ncol(A), label=names(infantMortality)[12:27], las=2)
axis(2, at=c(1:11), label=rev(names(infantMortality)[1:11]), las=2)
```
<br>
We use  `bayesRecon::schaferStrimmer_cov` to estimate the covariance matrix of the residuals with shrinkage [@schafer2005shrinkage].

```{r infants reconc}
# means
mu <- sapply(fc, "[[", 1)
# Shrinkage covariance
shrink.res <- bayesRecon::schaferStrimmer_cov(residuals)
print(paste("The estimated shrinkage intensity is", round(shrink.res$lambda_star, 3)))
Sigma <- shrink.res$shrink_cov
```

We now perform Gaussian reconciliation:

```{r infants-recon}
recon.gauss <- bayesRecon::reconc_gaussian(A,
                                           base_forecasts.mu = mu,
                                           base_forecasts.Sigma = Sigma)

bottom_mu_reconc <- recon.gauss$bottom_reconciled_mean
bottom_Sigma_reconc <- recon.gauss$bottom_reconciled_covariance

# Obtain reconciled mu and Sigma for the upper variable
upper_mu_reconc <- A %*% bottom_mu_reconc
upper_Sigma_reconc <- A %*% bottom_Sigma_reconc %*% t(A)

upper_mu_reconc
```

# References
<div id="refs"></div>