---
title: Getting Started with gptools in R
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with gptools in R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
params:
  EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = params$EVAL
)
```

```{r, include = FALSE}
# Make sure cmdstanr is all set up. But we don't need to show the reader this.
cmdstanr::check_cmdstan_toolchain(fix = TRUE)
cmdstanr::install_cmdstan(version = "2.36.0")
```

`gptoolsStan` is a minimal package to publish Stan code for efficient Gaussian process inference. The package can be used with the [`cmdstanr`](https://mc-stan.org/cmdstanr/) interface for Stan in R. Unfortunately, [`Rstan`](https://mc-stan.org/rstan/) is not supported because it [does not provide an option to specify include paths](https://discourse.mc-stan.org/t/specifying-include-paths-in-rstan/32182/2). If you're already familiar with `cmdstanr`, dive in below. If not, have a look at the [getting started guide](https://mc-stan.org/cmdstanr/articles/cmdstanr.html) for the `cmdstanr` interface.

This vignette demonstrates the package by sampling from a simple Gaussian process using Fourier methods (see the accompanying publication ["Scalable Gaussian Process Inference with Stan"](https://arxiv.org/abs/2301.08836) for background on the approach). Here is the model definition in Stan.

```{r, results='markup', comment='', echo=FALSE}
cat(readLines("getting_started.stan"), sep = "\n")
```

Here, we assume that `cmdstanr` is installed and that the `cmdstan` compiler is available. See [here](https://mc-stan.org/cmdstanr/#installation) if you need help getting set up with `cmdstanr`. Let's compile the model.

```{r}
library(cmdstanr)
library(gptoolsStan)

model <- cmdstan_model(
  stan_file="getting_started.stan",
  include_paths=gptools_include_path(),
)
```

As indicated in the `data` section of the Stan program, we need to define the number of elements `n` of the Gaussian process and the [real fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform#FFT_algorithms_specialized_for_real_or_symmetric_data) (RFFT) of the covariance kernel `cov_rfft`. We'll use 100 elements and a [squared exponential covariance kernel](https://en.wikipedia.org/wiki/Gaussian_process#Usual_covariance_functions) which allows us to evaluate the RFFT directly.

```{r}
n <- 100
length_scale <- n / 10
freq <- 1:(n %/% 2 + 1) - 1
# See appendix B of https://arxiv.org/abs/2301.08836 for details on the expression.
cov_rfft <- exp(- 2 * (pi * freq * length_scale / n) ^ 2) + 1e-9
```

Let's obtain prior realization by sampling from the model.

```{r}
fit <- model$sample(
  data=list(n=n, cov_rfft=cov_rfft),
  seed=123,
  chains=1,
  iter_warmup=1000,
  iter_sampling=5,
)
```

We extract the draws from the `fit` object and plot a realization of the process.

```{r, fig.width=6, fig.height=5}
f <- fit$draws("f", format="draws_matrix")
plot(1:n, f[1,], type="l", xlab="covariate x", ylab="Gaussian process f(x)")
```

This vignette illustrates the use of gptools with an elementary example. Further details can be found in the [more comprehensive documentation](http://gptools-stan.readthedocs.io/) although using the [`cmdstanpy`](https://mc-stan.org/cmdstanpy/) interface.