---
title: "Date Parsing"
author: "Jonathan Callahan"
date: "2019-08-20"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Date Parsing}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
  
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Environmental Time Series Data

Dates, times and timezones can be frustrating, especially when working with
environmental time series such as those collected by air and water quality
sensors.

Environmental time series data often have a strong diurnal signal and are
typically plotted with a time axis displaying local time. However, when 
data are aggregated into larger collections, it is typical to store data with 
a universal time axis -- UTC.

Problems can arise when parsing and formatting dates and times because R
defaults to the system timezone available with `Sys.timezone()`. Imagine
an agency scientist based in Washington, DC, using their laptop to display 
recent air quality data from Los Angeles while at a conference in Tasmania. The 
data center processing  the data might be in Boulder but the data processing 
machine  might be set to use UTC. Potential timezones 
(available with `OlsonNames()`) relevant to this scenario include:

* `America/New_York`
* `America/Los_Angeles`
* `Australia/Tasmania`
* `America/Denver`
* `UTC`

Which timezone should be used to convert a request for data from "2019-08-08"" to 
"2018-08-15"" into `POSIXct` datetimes?

To enforce specification of timezones and to help with the common user interface
need to specify a range of dates or times, the **MazamaCoreUtils** package
provides the following functions:

* `dateRange()` -- parses and returns `POSIXct` start and end dates 
representing full days in the specified timezone
* `timeRange()` -- parses and returns `POSixct` start and end times in the 
specified timezone
* `parseDatetime()` -- parses and returns a vector of `POSIXct` values in the
specified timezone

The `parseDatetime()` function is intended as a timezone-requiring replacement 
for `lubridate::parse_date_time()`.

# Linting for timezones

Enforcing the specification of timezones throughout a body of code is the most
robust way to remove timezone-related errors from your software. To help with this
this type of code review, the package also includes functions for testing whether
specific named arguments are used with certain function calls:

* `lintFunctionArgs_file()` -- check a single file
* `lintFunctionArgs_dir()` -- check an entire directory

To use these functions you must define a set of `function:argument` rules to be 
applied such as:

```
timezoneLintRules <- list(
  "parse_date_time" = "tz",
  "with_tz" = "tzone",
  "now" = "tzone",
  "strftime" = "tz"
)
```

This is interpreted as:

* Every use of the `parse_date_time()` function must use the `tz` argument
explicitly.
* Every use of the `with_tz()` function must use the `tzone` argument explicitly
* ...

While these functions could be used to test for explicit use in any 
`function:argument` pair, our concern here is primarily with specification of
timezones. The package includes a detailed list of `timezoneLintRules` to help 
with this. As an example, here is the result of linting the `dateRange.R`
function in this package:

```
> lintFunctionArgs_file("R/dateRange.R", timezoneLintRules)
# A tibble: 7 x 6
  file        line_number column_number function_name   named_args includes_required
  <chr>             <int>         <int> <chr>           <list>     <lgl>            
1 dateRange.R         125            29 with_tz         <chr [1]>  TRUE             
2 dateRange.R         128            27 with_tz         <chr [1]>  TRUE             
3 dateRange.R         141            18 parse_date_time <chr [2]>  TRUE             
4 dateRange.R         142            18 parse_date_time <chr [2]>  TRUE             
5 dateRange.R         159            18 parse_date_time <chr [2]>  TRUE             
6 dateRange.R         176            18 parse_date_time <chr [2]>  TRUE             
7 dateRange.R         188            18 now             <chr [1]>  TRUE             
```

The result shows that the `dateRange.R` source code is consistent in always
explicitly specifying a timezone.

Hopefully, this attention to timezones will help your code avoid 
misunderstandings when it comes to date and time requests.