<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Example: sparse matrices}
-->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width=120)
```

In this vignette, we compare the computation time/memory usage of
dense `matrix` and sparse `Matrix`.

## Allocation and length

We begin with an analysis of the time/memory it takes to create these
objects.  In the `atime` code below, we allocate a `vector` for
comparison, and we specify a `result` function which computes the
`length` of the object `x` created by each expression. This means
`atime` will save `length` as a function of data size `N` (in addition
to time and memory).

```{r}
library(Matrix)
N_seq <- unique(as.integer(10^seq(0,7,by=0.25)))
vec.mat.result <- atime::atime(
  N=N_seq,
  vector=numeric(N),
  matrix=matrix(0, N, N),
  Matrix=Matrix(0, N, N),
  result=function(x)data.frame(length=length(x)))
plot(vec.mat.result)
```

The plot above shows three panels, one for each unit.

* `kilobytes` is the amount of memory used. We see that `Matrix` and
  `vector` use the same amount of memory asymptotically, whereas
  `matrix` uses more (larger slope on the log-log plot implies larger
  asymptotic complexity class).
* `length` is the value returned by the `length` function. We see that
  `matrix` and `Matrix` have the same value, whereas `vector` has
  asymptotically smaller length (smaller slope on log-log plot).
* `seconds` is the amount of time taken. We see that `Matrix` is
  slower than `vector` and `matrix` by a small constant overhead,
  which can be seen for small `N`. We also see that for large `N`,
  `Matrix` and `vector` have the same asymptotic time complexity,
  which is much faster than `matrix`.
  
### Comparison with `bench::press`

An alternative method to compute asymptotic timings is via
`bench::press`, which provides functionality for parameterized
benchmarking (similar to `atime_grid`). Because `atime()` has special
treatment of the `N` parameter, the code required for asymptotic
measurement is relatively simple; compare the `atime` code above to
the `bench::press` code below, which measures the same asymptotic
quantities (seconds, kilobytes, length).

```{r}
seconds.limit <- 0.01
done.vec <- NULL
measure.vars <- c("seconds","kilobytes","length")
press_result <- bench::press(N = N_seq, {
  exprs <- function(...)as.list(match.call()[-1])
  elist <- exprs(
    vector=numeric(N),
    matrix=matrix(0, N, N),
    Matrix=Matrix(0, N, N))
  elist[names(done.vec)] <- NA #Don't run exprs which already exceeded limit.
  mark.args <- c(elist, list(iterations=10, check=FALSE))
  mark.result <- do.call(bench::mark, mark.args)
  ## Rename some columns for easier interpretation.
  desc.vec <- attr(mark.result$expression, "description")
  mark.result$description <- desc.vec
  mark.result$seconds <- as.numeric(mark.result$median)
  mark.result$kilobytes <- as.numeric(mark.result$mem_alloc/1024)
  ## Compute length column to measure in addition to time/memory.
  mark.result$length <- NA
  for(desc.i in seq_along(desc.vec)){
    description <- desc.vec[[desc.i]]
    result <- eval(elist[[description]])
    mark.result$length[desc.i] <- length(result)
  }
  ## Set NA time/memory/length for exprs which were not run.
  mark.result[desc.vec %in% names(done.vec), measure.vars] <- NA
  ## If expr went over time limit, indicate it is done.
  over.limit <- mark.result$seconds > seconds.limit
  over.desc <- desc.vec[is.finite(mark.result$seconds) & over.limit]
  done.vec[over.desc] <<- TRUE
  mark.result
})
```

The `bench::press` code above is relatively complicated, because it re-implements two functions that are provided by atime:

* If an expression takes longer than the time limit of 0.01 seconds,
  then it will not be run for any larger `N` values. This keeps overall computation reasonable, even when comparing expressions which have different asymptotic time complexity (such as quadratic for `matrix` and linear for `Matrix` in this example).
* If you want to measure quantities other than `seconds` and `kilobytes` as a function of `N` (such as `length` in this example), then `atime` makes that easy (just provide a `result` function), whereas it is more complex to implement in `bench::press` (for loop is required).

Below we visualize the results from `bench::press`,

```{r}
library(data.table)
(press_long <- melt(
  data.table(press_result),
  measure.vars=measure.vars,
  id.vars=c("N","description"),
  na.rm=TRUE))
if(require(ggplot2)){
  gg <- ggplot()+
    ggtitle("bench::press results for comparison")+
    facet_grid(variable ~ ., labeller=label_both, scales="free")+
    geom_line(aes(
      N, value,
      color=description),
      data=press_long)+
    scale_x_log10(limits=c(NA, max(press_long$N*2)))+
    scale_y_log10("")
  if(requireNamespace("directlabels")){
    directlabels::direct.label(gg,"right.polygons")
  }else gg
}
```

We can see that the plot from `atime` and `bench::press` are consistent.

### Complexity class estimation with atime

Below we estimate the best asymptotic complexity classes:

```{r}
vec.mat.best <- atime::references_best(vec.mat.result)
plot(vec.mat.best)
```

The plot above shows that

* `matrix` has time, memory, and `length` which are all quadratic `O(N^2)`.
* `Matrix` has linear `O(N)` time and memory, but `O(N^2)` values for
  `length`.
* `vector` has time, memory, and `length` which are all linear `O(N)`.

Below we estimate the throughput for some given limits:

```{r}
vec.mat.pred <- predict(
  vec.mat.best,
  seconds=vec.mat.result$seconds.limit,
  ##kilobytes=1000,#not available on CRAN.
  length=100)
plot(vec.mat.pred)
```

In the plot above we can see the throughput `N` for a given limit of
`kilobytes`, `length` or `seconds`. Below we use `Matrix` as a
reference, and compute the throughput ratio, `Matrix` to other.

```{r}
library(data.table)
dcast(vec.mat.pred$prediction[
, ratio := N[expr.name=="Matrix"]/N, by=unit
], unit + unit.value ~ expr.name, value.var="ratio")
```

From the table above (`matrix` column), we can see that the throughput
of `Matrix` is 100-1000x larger than `matrix`, for the given limits.

## Matrix Multiplication, 90% sparsity

First we show the difference between sparse and dense matrix
multiplication, when the matrix has 90% zeros (10% non-zeros).

```{r}
library(Matrix)
sparse.prop <- 0.9
dense.prop <- 1-sparse.prop
mult.result <- atime::atime(
  N=as.integer(10^seq(1,4,by=0.25)),
  setup={
    m <- matrix(0, N, N)
    set.seed(1)
    w <- rnorm(N)
    N.not.zero <- as.integer(dense.prop*N^2)
    m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero)
    M <- Matrix(m)
  },
  sparse = M %*% w,
  dense = m %*% w,
  result=TRUE)
plot(mult.result)
```

Above we see that `sparse` is faster than `dense`, but by constant
factors.
Below we estimate the best asymptotic complexity classes:

```{r}
mult.best <- atime::references_best(mult.result)
plot(mult.best)
```

Above we see that both `sparse` and `dense` matrix multiplication are
quadratic `O(N^2)` time (for a quadratic number of non-zero entries).

Finally, we verify below that both methods yield the same result:

```{r}
library(data.table)
mult.compare <- dcast(
  mult.result$measurements, N ~ expr.name, value.var="result"
)[
, equal := paste(all.equal(as.numeric(dense[[1]]), as.numeric(sparse[[1]])))
, by=N
][]
tibble::tibble(mult.compare)
```

## Matrix multiplication, linear number of non-zeros

Next we show the difference between sparse and dense matrix
multiplication, when the matrix has a linear number of non-zeros
(asymptotically fewer than in the previous section).

```{r}
library(Matrix)
mult.result <- atime::atime(
  N=as.integer(10^seq(1,4,by=0.25)),
  setup={
    m <- matrix(0, N, N)
    set.seed(1)
    w <- rnorm(N)
    N.not.zero <- N
    m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero)
    M <- Matrix(m)
  },
  sparse = M %*% w,
  dense = m %*% w,
  result=TRUE)
plot(mult.result)
```

Above we see that `sparse` is asymptotically faster than `dense` (different asymptotic slopes).
Below we estimate the best asymptotic complexity classes:

```{r}
mult.best <- atime::references_best(mult.result)
plot(mult.best)
```

Above we see that `sparse` is linear `O(N)` time whereas `dense` is
quadratic `O(N^2)` time (for a linear number of non-zero entries).

Finally, we verify below that both methods yield the same result:

```{r}
library(data.table)
mult.compare <- dcast(
  mult.result$measurements, N ~ expr.name, value.var="result"
)[
, equal := paste(all.equal(as.numeric(dense[[1]]), as.numeric(sparse[[1]])))
, by=N
][]
tibble::tibble(mult.compare)
```

## Matrix multiplication, linear and quadratic number of non-zeros

In this section we show how you can code both comparisons at the same
time, without repetition. The trick is to first define a list of parameters to vary:

```{r}
param.list <- list(
  non_zeros=c("N","N^2/10"),
  fun=c("matrix","Matrix")
)
```

After that we create a grid of expressions to evaluate, by expanding the parameter grid:

```{r}
(expr.list <- atime::atime_grid(
  param.list,
  Mw=L[[fun]][[non_zeros]]%*%w,
  collapse="\n"))
```

Finally we pass the list of expressions to `atime`, along with a
`setup` argument which creates the required list `L` of input data,
based on the parameters:

```{r}
mult.result <- atime::atime(
  N=as.integer(10^seq(1,3.5,by=0.25)),
  setup={
    L <- list()
    set.seed(1)
    w <- rnorm(N)
    for(non_zeros in param.list$non_zeros){
      N.not.zero <- as.integer(eval(str2lang(non_zeros)))
      m <- matrix(0, N, N)
      m[sample(N^2, N.not.zero)] <- rnorm(N.not.zero)
      for(fun in param.list$fun){
        L[[fun]][[non_zeros]] <- get(fun)(as.numeric(m), N, N)
      }
    }
  },
  expr.list=expr.list)
plot(mult.result)
```

Below we estimate the best asymptotic complexity classes:

```{r}
mult.best <- atime::references_best(mult.result)
plot(mult.best)
```

Below we show an alternative visualization:

```{r}
only.seconds <- mult.best
only.seconds$measurements <- mult.best$measurements[unit=="seconds"]
only.seconds$plot.references <- mult.best$plot.references[unit=="seconds"]
if(require(ggplot2)){
  plot(only.seconds)+
    facet_grid(non_zeros ~ fun, labeller=label_both)
}
```

## Conclusion

Overall in this vignette we have shown how `atime` can be used to
demonstrate when sparse matrices can be used for efficient
computations.

* sparse matrices have linear rather than quadratic time/memory for creation.
* sparse matrix-vector multiply is asymptotically faster (linear
  rather than quadratic time) if there are a linear number of non-zero
  elements.

We also showed a comparison between `atime` and `bench::press`, which
highlighted two areas where `atime` is more convenient (stopping after
exceeding a time limit, and measuring quantities other than
time/memory as a function of data size `N`).