---
title: "Rolling calculations in tibbletime"
author: "Davis Vaughan"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Rolling calculations in tibbletime}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introducing rollify()

A common task in financial analyses is to perform a rolling calculation. This
might be a single value like a rolling mean or standard deviation, or it 
might be more complicated like a rolling linear regression. To account for this
flexibility, `tibbletime` has the `rollify()` function. This function allows 
you to turn _any_ function into a rolling version of itself. 

In the `tidyverse`, this type of function is known as an _adverb_ 
because it _modifies_ an existing function, which are 
typically given _verb_ names.

## Datasets required

```{r, message=FALSE, warning=FALSE}
library(tibbletime)
library(dplyr)
library(tidyr)

# Facebook stock prices.
data(FB)

# Only a few columns
FB <- select(FB, symbol, date, open, close, adjusted)

```


## A rolling average

To calculate a rolling average, picture a column in a data frame where you take
the average of the values in rows 1-5, then in rows 2-6, then in 3-7, and so on
until you reach the end of the dataset. This type of 5-period moving window is 
a rolling calculation, and is often used to smooth out noise in a dataset.

Let's see how to do this with `rollify()`.

```{r}

# The function to use at each step is `mean`.
# The window size is 5
rolling_mean <- rollify(mean, window = 5)

rolling_mean
```

We now have a rolling version of the function, `mean()`. You use it in a 
similar way to how you might use `mean()`.

```{r}
mutate(FB, mean_5 = rolling_mean(adjusted))
```

You can create multiple versions of the rolling function if you need to 
calculate the mean at multiple window lengths.

```{r}
rolling_mean_2 <- rollify(mean, window = 2)
rolling_mean_3 <- rollify(mean, window = 3)
rolling_mean_4 <- rollify(mean, window = 4)

FB %>% mutate(
  rm10 = rolling_mean_2(adjusted),
  rm20 = rolling_mean_3(adjusted),
  rm30 = rolling_mean_4(adjusted)
)
```

## Purrr functional syntax

`rollify()` is built using pieces from the `purrr` package. One of those is 
the ability to accept an anonymous function using the `~` function syntax.

The documentation, `?rollify`, gives a thorough walkthrough of the different
forms you can pass to `rollify()`, but let's see a few more examples.

```{r}
# Rolling mean, but with function syntax
rolling_mean <- rollify(.f = ~mean(.x), window = 5)

mutate(FB, mean_5 = rolling_mean(adjusted))
```

You can create anonymous functions (functions without a name) on the fly.

```{r}
# 5 period average of 2 columns (open and close)
rolling_avg_sum <- rollify(~ mean(.x + .y), window = 5)

mutate(FB, avg_sum = rolling_avg_sum(open, close))
```

## Optional arguments

To pass optional arguments (not `.x` or `.y`) to your rolling function,
they must be specified in the non-rolling form in the call to `rollify()`.

For instance, say our dataset had `NA` values, but we still wanted to calculate
an average. We need to specify `na.rm = TRUE` as an argument to `mean()`.

```{r}
FB$adjusted[1] <- NA

# Do this
rolling_mean_na <- rollify(~mean(.x, na.rm = TRUE), window = 5)

FB %>% mutate(mean_na = rolling_mean_na(adjusted))

# Don't try this!
# rolling_mean_na <- rollify(~mean(.x), window = 5)
# FB %>% mutate(mean_na = rolling_mean_na(adjusted, na.rm = TRUE))

# Reset FB
data(FB)
FB <- select(FB, symbol, date, adjusted)
```

## Returning more than 1 value per call

Say our rolling function returned a call to a custom `summary_df()` function. 
This function calculates a 5 number number summary and returns it as a tidy 
data frame.

We won't be able to use the rolling version of this out of the box. 
`dplyr::mutate()` will complain that an incorrect number of values were returned
since `rollify()` attempts to unlist at each call. Essentially, each call would
be returning 5 values instead of 1. What we need is to be able to
create a list-column. To do this, specify `unlist = FALSE` in the call 
to `rollify()`.

```{r}
# Our data frame summary
summary_df <- function(x) {
  data.frame(  
    rolled_summary_type = c("mean", "sd",  "min",  "max",  "median"),
    rolled_summary_val  = c(mean(x), sd(x), min(x), max(x), median(x))
  )
}

# A rolling version, with unlist = FALSE
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE)

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised
```

The neat thing is that after removing the `NA` values at the beginning, the
list-column can be unnested using `tidyr::unnest()` giving us a nice tidy 
5-period rolling summary. 

```{r}
FB_summarised %>% 
  filter(!is.na(summary_list_col)) %>%
  unnest(cols = summary_list_col)
```

## Custom missing values

The last example was a little clunky because to unnest we had to remove the first
few missing rows manually. If those missing values were empty data frames then
`unnest()` would have known how to handle them. Luckily, the `na_value` argument
will allow us to specify a value to fill the `NA` spots at the beginning of the
roll.

```{r}
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE, na_value = data.frame())

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised
```

Now unnesting directly:

```{r}
FB_summarised %>% 
  unnest(cols = summary_list_col)
```

Finally, if you want to actually keep those first few NA rows in the unnest, you 
can pass a data frame that is initialized with the same 
column names as the rest of the values.

```{r}
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE, 
                           na_value = data.frame(rolled_summary_type = NA,
                                                 rolled_summary_val  = NA))

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised %>% unnest(cols = summary_list_col)
```

## Rolling regressions

A final use of this flexible function is to calculate rolling regressions.

A very ficticious example is to perform a rolling regression on the `FB` dataset
of the form `close ~ high + low + volume`. Notice that we have 4 columns to pass
here. This is more complicated than a `.x` and `.y` example, but have no fear. 
The arguments can be specified in order as `..1`, `..2`, ... for as far as 
is required, or you can pass a freshly created anonymous function. 
The latter is what we will do so we can preserve the names of the 
variables in the regression.

Again, since this returns a linear model object, 
we will specify `unlist = FALSE`. Unfortunately there is no easy default NA
value to pass here.

```{r}
# Reset FB
data(FB)

rolling_lm <- rollify(.f = function(close, high, low, volume) {
                              lm(close ~ high + low + volume)
                           }, 
                      window = 5, 
                      unlist = FALSE)

FB_reg <- mutate(FB, roll_lm = rolling_lm(close, high, low, volume))
FB_reg
```

To get some useful information about the regressions, we will use `broom::tidy()`
and apply it to each regression using a `mutate() + map()` combination.

```{r}
FB_reg %>%
  filter(!is.na(roll_lm)) %>%
  mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>%
  unnest(tidied) %>%
  select(symbol, date, term, estimate, std.error, statistic, p.value)
```