---
title: "Collapsing"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    css: vignette.css
vignette: >
  %\VignetteIndexEntry{Collapsing}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{dplyr, haven, labelled}
---

In some situations, you may want to use `encodefrom()` to collapse
values, that is, group unique raw values into a smaller set of clean
values / labels. For example, say you have the following data set,
which gives each state's census division number and name:

#### Data

|id|state|cendiv|cendiv_name|
|:-|:---:|:----:|:----------|
|1|AL|6|East South Central|
|2|AK|9|Pacific|
|3|AZ|8|Mountain|
|4|AR|7|West South Central|
|5|CA|9|Pacific|
|6|CO|8|Mountain|
|7|CT|1|New England|
|8|DE|5|South Atlantic|
|10|FL|5|South Atlantic|
|12|HI|9|Pacific|
|14|IL|3|East North Central|
|15|IN|3|East North Central|
|16|IA|4|West North Central|
|31|NJ|2|Middle Atlantic|
|33|NY|2|Middle Atlantic|

Rather than using the nine census divisions, you would rather group
states by their regions. You have the following crosswalk:

#### Crosswalk
|cendiv|cenreg|cenregnm|
|:----:|:----:|:-------|
|1|1|Northeast|
|2|1|Northeast|
|3|2|Midwest|
|4|2|Midwest|
|5|3|South|
|6|3|South|
|7|3|South|
|8|4|West| 
|9|4|West|

As long as

1. `raw` values are unique in the crosswalk
2. `clean` and `label` columns have a 1:1 match

Then you can use `encodefrom()` to collapse categories as you move
from raw to clean values.

```{r, message = FALSE}
library(crosswalkr)
library(dplyr)
library(haven)
```

```{r}
## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
             state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
                       'IL','IN','IA','NJ','NY'),
			 cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
			 cendiv_name = c('East South Central','Pacific','Mountain',
                             'West South Central','Pacific','Mountain','New England',
                             'South Atlantic','South Atlantic','Pacific',
                             'East North Central','East North Central',
                             'West North Central','Middle Atlantic','Middle Atlantic'))
			 
## crosswalk
cw <- tibble(cendiv = 1:9,
             cenreg = c(1,1,2,2,3,3,3,4,4),
             cenregnm = c('Northeast','Northeast','Midwest','Midwest',
                          'South','South','South','West','West'))
```

```{r}
## encode new column
df <- df %>%
    mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
                               clean = cenreg, label = cenregnm))
```
```{r}
df
```