---
title: "Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{bidask}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
data.table::setDTthreads(1)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  out.width="100%",
  dpi = 300,
  warning = FALSE,
  message = FALSE
)
```

This vignette illustrates how to estimate bid-ask spreads from open, high, low, and close prices using the efficient estimator described in Ardia, Guidotti, & Kroencke (JFE, 2024): [https://doi.org/10.1016/j.jfineco.2024.103916](https://doi.org/10.1016/j.jfineco.2024.103916). 

```{r setup}
library(bidask)
```

The function `edge` computes a single bid-ask spread estimate from vectors of open, high, low, and close prices. The functions `edge_rolling` and `edge_expanding` are optimized for fast calculations over rolling and expanding windows, respectively. The function `spread` provides additional functionalities for `xts` objects and implements additional estimators. For all functions, an output value of 0.01 corresponds to a spread estimate of 1%.

## Functions `edge`, `edge_rolling`, `edge_expanding`

These functions can be easily used with tidy data. For instance, download daily prices for Bitcoin and Ethereum using the [crypto2](https://cran.r-project.org/package=crypto2) package:

```{r, results='hide'}
library(dplyr)
library(crypto2)
df <- crypto_list(only_active=TRUE) %>%
  filter(symbol %in% c("BTC", "ETH")) %>%
  crypto_history(start_date = "20200101", end_date = "20221231")
```

```{r}
head(df)
```

Estimate the spread for each coin in each year:

```{r}
df %>%
  mutate(yyyy = format(timestamp, "%Y")) %>%
  group_by(symbol, yyyy) %>%
  arrange(timestamp) %>%
  summarise("EDGE" = edge(open, high, low, close))
```

Estimate the spread using a rolling window of 30 days for each coin and plot the results:

```{r}
library(ggplot2)
df %>%
  group_by(symbol) %>%
  arrange(timestamp) %>%
  mutate("EDGE (rolling)" = edge_rolling(open, high, low, close, width = 30)) %>%
  ggplot(aes(x = timestamp, y = `EDGE (rolling)`, color = symbol)) +
  geom_line() +
  theme_minimal()
```

Estimate the spread using an expanding window for each coin and plot the results:
```{r}
df %>%
  group_by(symbol) %>%
  arrange(timestamp) %>%
  mutate("EDGE (expanding)" = edge_expanding(open, high, low, close)) %>%
  ggplot(aes(x = timestamp, y = `EDGE (expanding)`, color = symbol)) +
  geom_line() +
  theme_minimal()
```

Notice that, generally, using intraday data (instead of daily) improves the estimation accuracy, especially when the spread is expected to be small (see example below).

## Function `spread`

The function `spread()` provides additional functionalities for [xts](https://cran.r-project.org/package=xts) objects and implements additional estimators. For instance, download daily data for Microsoft (MSFT) using the [quantmod](https://cran.r-project.org/package=quantmod) package which returns an `xts` object:

```{r}
library(quantmod)
x <- getSymbols("MSFT", auto.assign = FALSE, start = "2019-01-01", end = "2022-12-31")
head(x)
class(x)
```

Estimate the spread with:

```{r}
spread(x)
```

or, equivalently:

```{r}
edge(open = x[,1], high = x[,2], low = x[,3], close = x[,4])
```

Estimate the spread for each month and plot the estimates:

```{r}
sp <- spread(x, width = endpoints(x, on = "months"))
plot(sp)
```

Estimate the spread using a rolling window of 21 obervations:

```{r}
sp <- spread(x, width = 21)
plot(sp)
```

To illustrate higher-frequency estimates, download intraday data from Alpha Vantage. You must register with Alpha Vantage in order to download their data, but the one-time registration is fast and free. Register at https://www.alphavantage.co/ to receive your key. You can set the API key globally as follows:

```{r}
setDefaults(getSymbols.av, api.key = "<API-KEY>")
```

Download minute data for Microsoft:

```r
x <- getSymbols(
  Symbols = "MSFT", 
  auto.assign = FALSE, 
  src = "av", 
  periodicity = "intraday", 
  interval = "1min", 
  output.size = "full")
```

```{r, include=FALSE}
x <- read.csv(system.file("extdata", "msft.csv", package = "bidask"))
x <- xts(x[,-1], order.by = as.POSIXct(x[,1]))
```

Keep only prices during regular market hours:

```{r}
x <- x["T09:30/T16:00"]
head(x)
```

Estimate the spread for each day and plot the estimates:

```{r}
sp <- spread(x, width = endpoints(x, on = "day"))
plot(sp, type = "b")
```

Use multiple estimators and plot the estimates:

```{r}
sp <- spread(x, width = endpoints(x, on = "day"), method = c("EDGE", "AR", "CS", "ROLL"))
plot(sp, type = "b", legend.loc = "topright")
```

## GitHub 

If you find this package useful, please [star the repo](https://github.com/eguidotti/bidask)! The repository also contains implementations for Python, C++, MATLAB, and more; as well as open data containing bid-ask spread estimates for crypto pairs in Binance and for U.S. stocks in CRSP.

## Cite as

> Ardia, D., Guidotti, E., Kroencke, T.A. (2024). Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices. *Journal of Financial Economics*, 161, 103916. [doi: 10.1016/j.jfineco.2024.103916](https://doi.org/10.1016/j.jfineco.2024.103916)

A BibTex  entry for LaTeX users is:

```bibtex
@article{edge,
  title = {Efficient estimation of bid–ask spreads from open, high, low, and close prices},
  journal = {Journal of Financial Economics},
  volume = {161},
  pages = {103916},
  year = {2024},
  doi = {https://doi.org/10.1016/j.jfineco.2024.103916},
  author = {David Ardia and Emanuele Guidotti and Tim A. Kroencke},
}
```