---
title: "Cache Management"
author: "Thomas Bergamaschi"
date: "2018-10-18"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cache Management}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

When building a web service, it is desirable to save commonly requested products
in a cache directory to avoid time wasted reproducing them unnecessarily.
Because the cache has finite disk space allocated to it, the cache should be 
routinely purged of old or outdated files to make room new ones. The 
```manageCache()``` utility function simplifies this process.

## A Product Cache Example

Lets first make a cache directory and put some data products in it.

```{r createCache}
# Create a cache directory
CACHE_DIR <- file.path(tempdir(), 'cache')
if ( file.exists(CACHE_DIR) == FALSE ) {
  dir.create(CACHE_DIR)
}

# Add a few files to the cache
write.csv(matrix(1,400,500), file=file.path(CACHE_DIR,'m1.csv'))
Sys.sleep(1) # wait a bit between each to give them different mtimes
write.csv(matrix(2,400,500), file=file.path(CACHE_DIR,'m2.csv'))
Sys.sleep(1)
write.csv(matrix(3,400,500), file=file.path(CACHE_DIR,'m3.csv'))
Sys.sleep(1)
write.csv(matrix(4,400,500), file=file.path(CACHE_DIR,'m4.csv'))
```

We can look in our new cache directory and see the four files we just added. 
The directory contains about 1.5 MB of data.

```{r checkCache}
cachedFiles <- list.files(CACHE_DIR, full.names = TRUE)
infoDF <- file.info(cachedFiles)
cacheSize = (sum(infoDF$size) / 1e6) # in MB
print(list.files(CACHE_DIR))
sprintf("Cache size = %s MB", cacheSize)
```

In order to simulate file requests, lets read two of them to update their access 
time.

```{r accessFiles, echo=TRUE}
# Access two of the files, updating their atime
invisible( read.csv(file.path(CACHE_DIR, 'm1.csv')) )
invisible( read.csv(file.path(CACHE_DIR, 'm2.csv')) )
```

Now, lets use ```manageCache()``` to get our cache down to 1 MB.

```{r manageCache}
# Use manageCache() to get cache to 1 MB
library(MazamaCoreUtils)
manageCache(CACHE_DIR, extensions = 'csv', maxCacheSize = 1)
```

When we check our cache again, we will see that the two files with the oldest
access times are gone and the cache size is now under 1 MB.

```{r checkCacheAgain}
# Check cache contents and total size again
cachedFiles <- list.files(CACHE_DIR, full.names = TRUE)
infoDF <- file.info(cachedFiles)
cacheSize = (sum(infoDF$size) / 1e6) # in MB
print(list.files(CACHE_DIR))
sprintf("Cache size = %s MB", cacheSize)
```

## Removing 'Stale' Products

Web services that provide access to real-time data often generate products that
have an expiration date. Files older than a specific number of days or hours 
should be removed from the cache because they no longer represent the current
status. Removing stale files can also help to keep the cache much smaller than 
the absolute maximum cache size, enhancing overall performance.

Stale files -- files that haven't been modified in a while -- can be removed 
regardless of  cache size with the `maxFileAge` parameter. When this is set, 
files with an  `mtime` older than `maxFileAge` will  be removed before any test 
of the  `maxCacheSize`. Fractional days are allowed.

You can remove standard products in the cache that haven't been modified in the 
last 3 hours with:

```{r maxFileAge}
manageCache(CACHE_DIR, maxFileAge = 3/24)
```

## Other Use Cases

When used to manage a product cache, the most typical behavior will be to sort
files based on last access time. The `manageCache()` function uses
`sortBy = "atime"` as the default. It is also possible to sort based on 
modification time `mtime` or change time `ctime`.

The use case scenario for `sortBy = "mtime"` might involve files that are
considered *stale* if the contents aren't updated.

A use case scenario for `sortBy = "ctime"` is not clear.