---
title: "Accessing Spatial and Population Data from Statistics Finland OGC api"
author: "Markus Kainu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Accessing Spatial and Population Data from Statistics Finland OGC api}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
library(geofi)
library(sf)
library(dplyr)
library(ggplot2)
library(tidyr)
```


## Introduction

The `geofi` package provides tools to access spatial data from **Statistics Finland's OGC API**, including administrative boundaries, population data by administrative units, and population data by statistical grid cells. This vignette demonstrates how to use the package's core functions to:

* Retrieve Finnish administrative area polygons (e.g., municipalities, regions).
* Fetch population data linked to administrative units.
* Access population data for statistical grid cells.

Unlike some other spatial data APIs, *no API key is required* to access Statistics Finland's OGC API, making it straightforward to get started. The package handles pagination, spatial filtering, and coordinate reference system (CRS) transformations, delivering data as `sf` objects compatible with the `sf` package for spatial analysis and visualization.

## Package Overview

The `geofi` package includes the following key functions for accessing Statistics Finland data:

* `ogc_get_statfi_area()`: Retrieves administrative area polygons (e.g., municipalities, wellbeing areas) for specified years, scales, and tessellation types.
* `ogc_get_statfi_area_pop()`: Fetches administrative area polygons with associated population data, pivoted into a wide format.
* `ogc_get_statfi_statistical_grid()`: Retrieves population data for statistical grid cells at different resolutions (1km or 5km).
* `fetch_ogc_api_statfi()`: An internal function that handles low-level API requests and pagination (not typically called directly by users).

All functions return spatial data as `sf` objects, making it easy to integrate with spatial analysis workflows in R.

## Step 1: Retrieving Administrative Area Polygons

The `ogc_get_statfi_area()` function retrieves polygons for Finnish administrative units, such as municipalities (`kunta`), wellbeing areas (`hyvinvointialue`), or regions (`maakunta`). You can customize the output with parameters like:

* `year`: The year of the boundaries (2020–2022).
* `scale`: Map resolution (1:1,000,000 or 1:4,500,000).
* `tessellation`: Type of administrative unit (e.g., kunta, hyvinvointialue).
* `crs`: Coordinate reference system (EPSG:3067 or EPSG:4326).
* `limit`: Maximum number of features (or NULL for all).
* `bbox`: Bounding box for spatial filtering.

### Example: Downloading Municipalities

Fetch all municipalities for 2022 at the 1:4,500,000 scale:

```{r}
munis <- ogc_get_statfi_area(year = 2022, scale = 4500, tessellation = "kunta")
print(munis)
```

Visualize the municipalities using `ggplot2`:

```{r}
ggplot(munis) +
  geom_sf() +
  theme_minimal() +
  labs(title = "Finnish Municipalities (2022)")
```

### Example: Spatial Filtering with a Bounding Box

To retrieve municipalities within a specific area (e.g., southern Finland), use the bbox parameter. Coordinates should match the specified crs.

```{r}
bbox <- "200000,6600000,500000,6900000"  # In EPSG:3067
munis_south <- ogc_get_statfi_area(
  year = 2022,
  scale = 4500,
  tessellation = "kunta",
  bbox = bbox,
  crs = 3067
)
```

Visualize the filtered results:

```{r}
ggplot(munis_south) +
  geom_sf() +
  theme_minimal() +
  labs(title = "Municipalities in Southern Finland (2022)")
```

### Example: Fetching Wellbeing Areas

Retrieve wellbeing areas (hyvinvointialue) for 2022:

```{r}
wellbeing <- ogc_get_statfi_area(
  year = 2022,
  tessellation = "hyvinvointialue",
  scale = 4500
)
```

## Step 2: Retrieving Population Data by Administrative Area

The `ogc_get_statfi_area_pop()` function fetches administrative area polygons with associated population data, pivoted into a wide format where each population variable is a column. Parameters include:

* `year`: The year of the data (2019–2021).
* `crs`: Coordinate reference system (EPSG:3067 or EPSG:4326).
* `limit`: Maximum number of features (or `NULL` for all).
* `bbox`: Bounding box for spatial filtering.

### Example: Fetching Population Data

Retrieve population data for 2021:

```{r}
pop_data <- ogc_get_statfi_area_pop(year = 2021, crs = 3067)
print(pop_data)
```

Visualize population density (assuming a variable like `population_total` exists):

```{r}
ggplot(pop_data) +
  geom_sf(aes(fill = population_total)) +
  scale_fill_viridis_c(option = "plasma") +
  theme_minimal() +
  labs(title = "Population by Administrative Area (2021)", fill = "Population")
```

## Example: Population Data with Bounding Box

Fetch population data within a bounding box:

```{r}
bbox <- "200000,6600000,500000,6900000"
pop_south <- ogc_get_statfi_area_pop(year = 2021, bbox = bbox, crs = 3067)
```

## Step 3: Retrieving Population Data by Statistical Grid

The `ogc_get_statfi_statistical_grid()` function retrieves population data for statistical grid cells at 1km or 5km resolution. Data is returned in EPSG:3067 (ETRS89 / TM35FIN). Parameters include:

* `year`: The year of the data (2019–2021).
* `resolution`: Grid cell size (1000m or 5000m).
* `limit`: Maximum number of features (or `NULL` for all).
* `bbox`: Bounding box for spatial filtering.

### Example: Fetching 5km Grid Data

Retrieve population data for a 5km grid in 2021:

```{r}
grid_data <- ogc_get_statfi_statistical_grid(year = 2021, resolution = 5000)
print(grid_data)
```

Visualize the grid data:

```{r}
ggplot(grid_data) +
  geom_sf(aes(fill = population_total), color = NA) +
  scale_fill_viridis_c(option = "magma") +
  theme_minimal() +
  labs(title = "Population by 5km Grid Cells (2021)", fill = "Population")
```

### Example: 1km Grid with Bounding Box

Fetch 1km grid data within a bounding box:

```{r}
bbox <- "200000,6600000,500000,6900000"
grid_south <- ogc_get_statfi_statistical_grid(
  year = 2021,
  resolution = 1000,
  bbox = bbox
)
```

## Advanced Features

### Pagination

When `limit = NULL`, the `fetch_ogc_api_statfi()` function automatically paginates through large datasets, fetching up to 10,000 features per request. This ensures all available data is retrieved, even for large administrative or grid datasets.

### Error Handling

The package includes robust error handling:

* Validates inputs (e.g., year, scale, tessellation, CRS, bounding box format).
* Provides informative error messages for API failures or invalid responses.
* Returns `NULL` with a warning if no data is retrieved, helping users diagnose issues.

### Coordinate Reference Systems

The functions support two CRS options:

* **EPSG:3067** (ETRS89 / TM35FIN): The default for Finnish spatial data, suitable for local analyses.
* **EPSG:4326** (WGS84): Useful for global compatibility or web mapping.

Note that `ogc_get_statfi_statistical_grid()` is fixed to EPSG:3067, as per the API's design.

### Bounding Box Filtering

The `bbox` parameter allows spatial filtering to focus on specific regions. Ensure coordinates match the specified `crs` (e.g., EPSG:3067 for grid data). Example format: "`200000`,`6600000`,`500000`,`6900000`".

## Best Practices

* **Test with Limits**: For large datasets (e.g., 1km grids), start with a small `limit` or `bbox` to estimate runtime before fetching all features.
* **CRS Selection**: Use `EPSG:3067` for Finnish data unless you need `EPSG:4326` for compatibility with other systems.
* *Check Tessellation Types*: Verify valid `tessellation` options (`kunta`, `hyvinvointialue`, etc.) when using `ogc_get_statfi_area()`.
* *Inspect Output*: Population data from `ogc_get_statfi_area_pop()` and `ogc_get_statfi_statistical_grid()` is pivoted into wide format. Check column names to identify available variables.

## Additional Resources

* [Statistics Finland Geoserver](https://geo.stat.fi/inspire/): Documentation for the OGC API.
* [geofi GitHub Repository](https://github.com/rOpenGov/geofi): Source code and issue tracker.
* [sf Package Documentation](https://r-spatial.github.io/sf/): For working with sf objects.
* [ggplot2 Documentation](https://ggplot2.tidyverse.org/): For visualizing spatial data.

## Conclusion

The `geofi` package simplifies access to Statistics Finland's spatial and population data, enabling analyses of administrative boundaries, population distributions, and grid-based statistics. With no API key required, users can quickly retrieve and visualize data using `sf` and `ggplot2`. Try the examples above to explore Finland's spatial and demographic datasets!