--- title: "Introduction to geobr (R)" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to geobr (R)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = identical(tolower(Sys.getenv("NOT_CRAN")), "true"), out.width = "100%" ) ``` The [**geobr**](https://github.com/ipeaGIT/geobr) package provides quick and easy access to official spatial data sets of Brazil. The syntax of all **geobr** functions operate on a simple logic that allows users to easily download a wide variety of data sets with updated geometries and harmonized attributes and geographic projections across geographies and years. This vignette presents a quick intro to **geobr**. ## Installation You can install geobr from CRAN or the development version to use the latest features. ```{r eval=FALSE, message=FALSE, warning=FALSE} # From CRAN install.packages("geobr") # Development version utils::remove.packages('geobr') devtools::install_github("ipeaGIT/geobr", subdir = "r-package") ``` Now let's load the libraries we'll use in this vignette. ```{r message=FALSE, warning=FALSE, results='hide'} library(geobr) library(sf) library(dplyr) library(ggplot2) ``` ## General usage ### Available data sets The geobr package covers 27 spatial data sets, including a variety of political-administrative and statistical areas used in Brazil. You can view what data sets are available using the `list_geobr()` function. ```{r message=FALSE, warning=FALSE} # Available data sets datasets <- list_geobr() head(datasets) ``` ## Download spatial data as `sf` objects The syntax of all *geobr* functions operate one the same logic, so the code to download the data becomes intuitive for the user. Here are a few examples. Download an specific geographic area at a given year ```{r message=FALSE, warning=FALSE} # State of Sergige state <- read_state( code_state="SE", year=2018, showProgress = FALSE ) # Municipality of Sao Paulo muni <- read_municipality( code_muni = 3550308, year=2010, showProgress = FALSE ) ggplot() + geom_sf(data = muni, color=NA, fill = '#1ba185') + theme_void() ``` Download all geographic areas within a state at a given year ```{r message=FALSE, warning=FALSE, results='hide'} # All municipalities in the state of Minas Gerais muni <- read_municipality(code_muni = "MG", year = 2007, showProgress = FALSE) # All census tracts in the state of Rio de Janeiro cntr <- read_census_tract( code_tract = "RJ", year = 2010, showProgress = FALSE ) head(muni) ``` If the parameter `code_` is not passed to the function, geobr returns the data for the whole country by default. ```{r message=FALSE, warning=FALSE} # read all intermediate regions inter <- read_intermediate_region( year = 2017, showProgress = FALSE ) # read all states states <- read_state( year = 2019, showProgress = FALSE ) head(states) ``` ## Important note about data resolution All functions to download polygon data such as states, municipalities etc. have a `simplified` argument. When `simplified = FALSE`, geobr will return the original data set with high resolution at detailed geographic scale (see documentation). By default, however, `simplified = TRUE` and geobr returns data set geometries with simplified borders to improve speed of downloading and plotting the data. ## Plot the data Once you've downloaded the data, it is really simple to plot maps using `ggplot2`. ```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center"} # Remove plot axis no_axis <- theme(axis.title=element_blank(), axis.text=element_blank(), axis.ticks=element_blank()) # Plot all Brazilian states ggplot() + geom_sf(data=states, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) + labs(subtitle="States", size=8) + theme_minimal() + no_axis ``` Plot all the municipalities of a particular state, such as Rio de Janeiro: ```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center"} # Download all municipalities of Rio all_muni <- read_municipality( code_muni = "RJ", year= 2010, showProgress = FALSE ) # plot ggplot() + geom_sf(data=all_muni, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) + labs(subtitle="Municipalities of Rio de Janeiro, 2000", size=8) + theme_minimal() + no_axis ``` ## Thematic maps The next step is to combine data from ***geobr*** package with other data sets to create thematic maps. In this first example, we will be using data from the (Atlas of Human Development (by Ipea/FJP and UNPD) to create a choropleth map showing the spatial variation of **Life Expectancy at birth** across Brazilian states. #### Merge external data First, we need a `data.frame` with estimates of Life Expectancy and merge it to our spatial database. The two-digit abbreviation of state name is our key column to join these two databases. ```{r message=FALSE, warning=FALSE, results='hide'} # Read data.frame with life expectancy data df <- utils::read.csv(system.file("extdata/br_states_lifexpect2017.csv", package = "geobr"), encoding = "UTF-8") states$name_state <- tolower(states$name_state) df$uf <- tolower(df$uf) # join the databases states <- dplyr::left_join(states, df, by = c("name_state" = "uf")) ``` #### Plot thematic map ```{r message=FALSE, warning=FALSE, fig.height = 8, fig.width = 8, fig.align = "center" } ggplot() + geom_sf(data=states, aes(fill=ESPVIDA2017), color= NA, size=.15) + labs(subtitle="Life Expectancy at birth, Brazilian States, 2014", size=8) + scale_fill_distiller(palette = "Blues", name="Life Expectancy", limits = c(65,80)) + theme_minimal() + no_axis ``` ### Using **geobr** together with **censobr** Following the same steps as above, we can use together **geobr** with our sister package [**censobr**](https://ipeagit.github.io/censobr/index.html) to map the proportion of households connected to a sewage network in Brazilian municipalities First, we need to download households data from the Brazilian census using the `read_households()` function. ```{r} library(censobr) library(arrow) hs <- read_households(year = 2010, showProgress = FALSE) ``` Now we're going to (a) group observations by municipality, (b) get the number of households connected to a sewage network, (c) calculate the proportion of households connected, and (d) collect the results. ```{r, warning = FALSE} esg <- hs |> collect() |> group_by(code_muni) |> # (a) summarize(rede = sum(V0010[which(V0207=='1')]), # (b) total = sum(V0010)) |> # (b) mutate(cobertura = rede / total) |> # (c) collect() # (d) head(esg) ``` Now we only need to download the geometries of Brazilian municipalities from **geobr**, merge the spatial data with our estimates and map the results. ```{r, warning = FALSE} # download municipality geometries muni_sf <- geobr::read_municipality(year = 2010, showProgress = FALSE) # merge data esg_sf <- left_join(muni_sf, esg, by = 'code_muni') # plot map ggplot() + geom_sf(data = esg_sf, aes(fill = cobertura), color=NA) + labs(title = "Share of households connected to a sewage network") + scale_fill_distiller(palette = "Greens", direction = 1, name='Share of\nhouseholds', labels = scales::percent) + theme_void() ```