---
title: "Datasets in geofi-package"
author: "Markus Kainu, Leo Lahti & Joona Lehtomäki"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Datasets in geofi-package}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE, 
  warning = FALSE,
  fig.height = 7, 
  fig.width = 7,
  dpi = 75
)
```

**`geofi`-package provides access to multiple dataset of different types and for different use. In this vignette we introduce the different datas and explain their use cases. Vignette *Making maps using `geofi`-package* provides multiple real-world examples of their usage.**

**Packages installation**

`geofi` can be installed from CRAN using

```{r, eval = FALSE}
# install from CRAN
install.packages("geofi")

# Install development version from GitHub
remotes::install_github("ropengov/geofi")
```

```{r include = FALSE, eval = TRUE}
# Let's first create a function that checks if the suggested 
# packages are available
check_namespaces <- function(pkgs){
  return(all(unlist(sapply(pkgs, requireNamespace,quietly = TRUE))))
}
apiacc <- geofi::check_api_access()
pkginst <- check_namespaces(c("geofacet","ggplot2","dplyr"))
apiacc_pkginst <- all(apiacc,pkginst)
```

## Municipality keys

Official administrative regions in Finland are based on municipalities. In 2021 there are 309 municipalities in Finland and the number is decreasing over time through mergers.\
\
Each municipality belongs to a higher level regional classifications such as regions (maakunta) or health care districts (sairaanhoitopiiri). `municipality_key_`-datasets are based on Statistics Finland [Statistical classification](https://data.stat.fi/api/classifications/v2/) -api with few modification and provided on yearly basis.

```{r municipality_keys, eval = pkginst}
library(geofi)
library(dplyr)
d <- data(package = "geofi")
as_tibble(d$results) |> 
  select(Item,Title) |> 
    filter(grepl("municipality_key", Item))
```

Looking at the names of \`municipality_key_2023\` there is 69 different variables from each municipality.

```{r municipality_key_names, eval = TRUE}
names(geofi::municipality_key_2023)
```

With these municipality keys you can easily aggregate municipalities for plotting or you can list different regional breakdowns.\

```{r municipality_key_maakunta, eval = pkginst}
geofi::municipality_key_2023 |> 
  count(maakunta_code,maakunta_name_fi,maakunta_name_sv,maakunta_name_en)
```

Municipality keys are joined with the municipality spatial data by default, meaning that data returned by `get_municipality()` can be aggregated as it is.

## Spatial data

Spatial data is provided as administrative regions (polygons), population and statistical grids (polygons) and municipality centers (points).

### Municipality borders

Municipality borders are provided yearly from 2013 and in two scales 1: 1 000 000 and 1:4 500 000. Use `1000` or `4500` as value for `scale`-argument, respectively.

```{r municipality_map, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
municipalities <- get_municipalities(year = 2023, scale = 4500)
plot(municipalities["municipality_name_fi"], border = NA)
```

### Municipality borders with population

In 2022 a new data source is introduced that provides you municipality borders with municipality population data. Spatial data is provided in 1:4 500 000 scale.

Calling the function with year = 2019 returns population data from 2019-12-31 with spatial data on borders from 2020.

The statistical variables in the data are: total population (vaesto), share of the total population (vaesto_p), number of men (miehet), men's share of the population in an area (miehet_p) and women (naiset), women's share (naiset_p), those aged under 15: number (ika_0_14), share (ika_0_14p), those aged 15 to 64: number (ika_15_64), share (ika_15_64p), and aged 65 or over: number (ika_65_), share (ika_65_p).

To plot men's share at the municipality level in 2020 (2021 municipality borders) you can simply to this. 

```{r muni_pop_map1, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
get_municipality_pop(year = 2022) |>  
  subset(select = miehet_p) |> 
  plot()
```

Aggregating the absolute population numbers is straightforward: to plot population at Wellbeing service county level you can do. 


```{r muni_pop_map2, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
get_municipality_pop(year = 2022) |>  
  group_by(hyvinvointialue_name_fi) |>  
  summarise(vaesto = sum(vaesto)) |>  
  select(vaesto) |> 
  plot()
```

To plot the men's share at wellbeing service country level you have to add one more step

```{r muni_pop_map3, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
get_municipality_pop(year = 2022) |>  
  dplyr::group_by(hyvinvointialue_name_fi) |> 
  summarise(vaesto = sum(vaesto),
            miehet = sum(miehet)) |> 
  mutate(share = miehet/vaesto*100) |> 
  select(share) |> 
  plot()
```



### Zipcodes

Zipcodes are provided in a single resolution from 2015.

```{r zipcode_map, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
zipcodes <- get_zipcodes(year = 2023) 
plot(zipcodes["nimi"], border = NA)
```

### Statistical grid

[Grid net for statistics](https://stat.fi/org/avoindata/paikkatietoaineistot/tilastoruudukko_1km_en.html) both in 1 km x 1 km and 5 km x 5km covers whole of Finland. The grid net includes all grid squares in Finland.

Statistics Finland [proprietary grid database](https://stat.fi/tup/ruututietokanta/index_en.html) provides the attribute statistical data for these grid nets.

```{r statisticsl_grid_data, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
stat_grid <- get_statistical_grid(resolution = 5, auxiliary_data = TRUE)
plot(stat_grid["euref_x"], border = NA)
```

### Population grid

Number of population by both 1 km x 1 km and 5 km x 5 km grids. The number of population on the last day of the reference year (31 December) by age group. Data includes only inhabited grids. The statistical variables of the data are:

Total population (`vaesto`), number of men (`miehet`) and women (`naiset`), under 15 year olds (`ika_0_14`), 15-64 year olds (`ika_15_64`), and aged over 65 (`ika_65_`). Only the number of population is reported for grids of under 10 inhabitants. See [Population grid data](https://stat.fi/org/avoindata/paikkatietoaineistot/vaestoruutuaineisto_5km_en.html).

The data describes the population distribution independent of administrative areas (such as municipal borders). The data is suitable for examination of population distribution and making various spatial analysis.

```{r population_grid_data, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
pop_grid <- get_population_grid(year = 2018, resolution = 5)
plot(pop_grid["kunta"], border = NA)
```

### Central localities of municipalities

[National Land Survey of Finland](https://maanmittauslaitos.fi) maintains [Topological Database](https://www.maanmittauslaitos.fi/en/maps-and-spatial-data/datasets-and-interfaces/product-descriptions/topographic-database) that contains a wide range of layers from which you can access the locations of central localities of each municipality in Finland.

```{r central_localities, fig.height = 7, fig.width = 4, eval = apiacc_pkginst}
plot(municipality_central_localities()["teksti"])
```

## Custom geofacet grid data

From Ryan Hafen's [blog](https://ryanhafen.com/blog/geofacet/):

> The [geofacet](https://hafen.github.io/geofacet/) package extends [ggplot2](https://ggplot2.tidyverse.org/) in a way that makes it easy to create geographically faceted visualizations in R. To geofacet is to take data representing different geographic entities and apply a visualization method to the data for each entity, with the resulting set of visualizations being laid out in a grid that mimics the original geographic topology as closely as possible.

`geofi`-package contains custom grids to be used with various Finnish administrative breakdowns as listed below.

```{r geofacets, eval = apiacc_pkginst}
d <- data(package = "geofi")
as_tibble(d$results) |> 
  select(Item,Title) |> 
    filter(grepl("grid", Item)) |> 
  print(n = 100)
```

Here is an example where population data at municipality level is pulled from THL from 2000 to 2022, then aggregated at the levels of regions (`maakunta`) and then plotted with ggplot2 using grid `geofi::grid_maakunta`. Population data is provided as part of geofi package as `geofi::sotkadata_population`.

```{r geofacet, fig.height = 8, fig.width = 10, eval = apiacc_pkginst}
# Let pull population data from THL
sotkadata <- geofi::sotkadata_population

# lets aggregate population data
dat <- left_join(geofi::municipality_key_2023 |> select(-year),
                 sotkadata) |> 
  group_by(maakunta_code, maakunta_name_fi,year) |> 
  summarise(population = sum(primary.value, na.rm = TRUE)) |> 
  na.omit() |> 
  ungroup() |> 
  rename(code = maakunta_code, name = maakunta_name_fi)

library(geofacet)
library(ggplot2)

ggplot(dat, aes(x = year, y = population/1000, group = name)) + 
  geom_line() + 
  facet_geo(facets = ~name, grid = grid_maakunta, scales = "free_y") +
  theme(axis.text.x = element_text(size = 6)) +
  scale_x_discrete(breaks = seq.int(from = 2000, to = 2023, by = 5)) +
  labs(title = unique(sotkadata$indicator.title.fi), y = "%")

```