---
title: "Daily Weather Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Daily Weather Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
# rmarkdown::render("vignettes/daily.Rmd")
# rmarkdown::render("vignettes/daily.Rmd", rmarkdown::github_document())
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(GHCNr)
library(terra)  # for handling countries geometries
```

# Select GHCNd stations

The station inventory file of GHCNd is stored at <https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily>.
The function `stations()` can read from this source or from a local file, specified with `filename`.
The inventory can also be downloaded to a file using `download_inventory()`.

```{r download-inventory, eval=FALSE}
inventory_file <- download_inventory("~/Downloads/ghcn-inventory.txt")
s <- stations(
  inventory_file,
  variables = "TMAX",
  first_year = 1990,
  last_year = 2000
)
```

```{r read-inventory, eval=FALSE}
s <- stations(variables = "TMAX", first_year = 1990, last_year = 2000)
s
# A tibble: 16,763 × 6
   station     latitude longitude variable firstYear lastYear
   <chr>          <dbl>     <dbl> <chr>        <dbl>    <dbl>
 1 AE000041196     25.3     55.5  TMAX          1944     2024
 2 AEM00041194     25.3     55.4  TMAX          1983     2024
 3 AEM00041217     24.4     54.7  TMAX          1983     2024
 4 AFM00040938     34.2     62.2  TMAX          1973     2020
 5 AFM00040948     34.6     69.2  TMAX          1966     2021
 6 AFM00040990     31.5     65.8  TMAX          1973     2020
 7 AG000060390     36.7      3.25 TMAX          1940     2024
 8 AG000060590     30.6      2.87 TMAX          1940     2024
 9 AG000060611     28.0      9.63 TMAX          1958     2024
10 AG000060680     22.8      5.43 TMAX          1940     2004
# ℹ 16,753 more rows
# ℹ Use `print(n = ...)` to see more rows
```

By specifying `variables = "TMAX"` only the stations that recorded that variable are kept.
Available variables implemented at the moment are precipitation ("PRCP"), minimum temperature ("TMIN"), and maximum temperature ("TMAX").
The arguments `first_year` and `last_year` specify the minimum time period required for the stations.
Here, stations that are not sampled at least from 1990 until at least 2000 are dropped.
```{r filter-by-year, eval=FALSE}

Spatial filters can also be easily applied.
Spatial boundaries of countries can be downloaded from <https://www.geoboundaries.org/> using the `get_countr(couuntry_code = ...)` function, where `country_code` is the ISO3 code.
```{r get-country, eval=FALSE}
italy <- get_country("ITA")
```

`get_countries()` can take several ISO3 codes to return a geometry of multiple countries.

```{r spatial-filter, eval=FALSE}
s <- filter_stations(s, italy)
s
# A tibble: 41 × 6
   station     latitude longitude variable firstYear lastYear
   <chr>          <dbl>     <dbl> <chr>        <dbl>    <dbl>
 1 IT000016090     45.4     10.9  TMAX          1951     2024
 2 IT000016134     44.2     10.7  TMAX          1951     2024
 3 IT000016232     42       15    TMAX          1975     2024
 4 IT000016239     41.8     12.6  TMAX          1951     2024
 5 IT000016320     40.6     17.9  TMAX          1951     2024
 6 IT000016560     39.2      9.05 TMAX          1951     2024
 7 IT000160220     46.2     11.0  TMAX          1951     2024
 8 IT000162240     42.1     12.2  TMAX          1954     2024
 9 IT000162580     41.7     16.0  TMAX          1951     2024
10 ITE00100554     45.5      9.19 TMAX          1763     2008
# ℹ 31 more rows
# ℹ Use `print(n = ...)` to see more rows
```

# Download daily timeseries

Daily timeseries for a station can be downloaded using the `daily()` function.
In addition to the station ID, `daily()` needs start and end dates of the timeseries.
These should be provided as strings with the format "YYYY-mm-dd", e.g., "1990-01-01".

```{r daily, eval=FALSE}
daily_ts <- daily(
  station_id = "CA003076680",
  start_date = paste("2002", "11", "01", sep = "-"),
  end_date = paste("2024", "04", "22", sep = "-"),
  variables = "tmax"
)
daily_ts
```

```{r daily-saved, echo=FALSE}
daily_ts <- CA003076680[, c("date", "station", "tmax", "tmax_flag")]
daily_ts
```

Multiple stations can also be downloaded at once.
Too many stations will cause the API to fail.
```{r multidaily, eval=FALSE}
daily_ts <- daily(
  station_id = c("CA003076680", "USC00010655"),
  start_date = paste("2002", "11", "01", sep = "-"),
  end_date = paste("2024", "04", "22", sep = "-"),
  variables = "tmax"
)
plot(daily_ts, "tmax")
```
```{r multidaily-saved, echo=FALSE}
daily_ts <- rbind(CA003076680, USC00010655)[, c("date", "station", "tmax", "tmax_flag")]
plot(daily_ts, "tmax")
```

Implmented variables are "tmin", "tmax", and "prcp".
`daily()` returns a table with the value of the variable chosen and associated flags.

## Remove flagged records

Flagged records can be removed using `remove_flagged()`.
In `remove_flagged()` the argument `strict` (dafault = `TRUE`) specifies which flags to include.
The flags removed are:
```{r flags, echo=FALSE}
as.list(GHCNr:::.flags(strict = TRUE))
```
Setting `strict = FALSE` will only remove the flags:
```{r flags-strict, echo=FALSE}
as.list(GHCNr:::.flags(strict = FALSE))
```

This will also remove the "*_flag=" column.
```{r remove-flagged}
daily_ts <- remove_flagged(daily_ts)
plot(daily_ts, "tmax")
```

# Temporal coverage
Coverage of the timeseries can be calculated using `coverage()`.
```{r daily-coverage}
station_coverage <- coverage(daily_ts)
station_coverage
```
`period_coverage_*` calculates the coverage across the whole period, including missing years.

The output is a table with coverage by month and year (`monthly_coverage`), by year (`annual_coverage`), and for the whole time period (`period_coverage`).
`annual_coverage` is constant within the same year and `year` is always a constant.
This table is useful to inspect stations that may have problematic timeseries, such as 
```{r low-coverage}
unique(station_coverage[
  station_coverage$annual_coverage_tmax < .95,
  c("station", "year", "annual_coverage_tmax")
])
```

# Monthly and annual timeseries, climatological normals

The functions `monthly()`, `quarterly()`, and `annual()` summarized the weather time series to monthly, quarterly, and annual time series, respectively.
Summaries are calculated as follows:

  - $T_{min}$ is the minimum daily temperature recorded in the month or the year.
  - $T_{max}$ is the maximum daily temperature recorded in the month or the year.
  - $Prcp$ is the cumulative precipitation during the month or year.

`NA`s are removed during calculation.


```{r monthly}
monthly_ts <- monthly(daily_ts)
monthly_ts
plot(monthly_ts, "tmax")
```

```{r quarterly}
quarterly_ts <- quarterly(daily_ts)
quarterly_ts
plot(quarterly_ts, "tmax")
```

```{r annual}
annual_ts <- annual(daily_ts)
annual_ts
plot(annual_ts, "tmax")
```