---
title: "Blur Example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Blur Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

One way to reduce identifiability of a data set is by converting a categorical variable to have a more aggregated taxonomy (i.e. a many-to-one mapping).  Here we refer to such a method as a 'blur' as it causes features to be joined together in such a way to hide the underlying information.

As an example, consider the `ShiftsWorked` data:

```{r setup}
library(deident)
head(ShiftsWorked)
```

A simple 'blur' might be to change the taxonomy of 'Shift' e.g. combine 'Day' and 'Night' into a new group 'Working' and ignore the 'Rest' shifts.  To do this we define the values we wish to change as a vector, build a pipeline and apply it to the data:

``` {r}
shift_blur <- c("Day" = "Working", "Night" = "Working")
blur_pipe <- ShiftsWorked |>
  add_blur(Shift, blur=shift_blur)

apply_deident(ShiftsWorked, blur_pipe)
```

### The `category_blur` utility

Applying the blur is relatively simple, but constructing it can be complex.  Consider the `starwars` data set supplied by dplyr:

``` {r}
starwars <- dplyr::starwars
head(starwars)
```

And notably the `species` variable:

``` {r}
table(starwars$species)
```

Imagine we wanted to reduce identifiability by aggregating the data into Human
 vs Non-Human.  We could code the vector by hand, but human error can lead to mistakes. To aid in designing complex blurs we supply the `category_blur` utility which uses regex to define the groups.

``` {r}
human_blur <- category_blur(
  starwars$species,
  "NotHuman" = "^(?!Human)" # Doesn't start with "Human"
)
```

And the vector returned can be passed into a new pipeline as before.

``` {r}
species_pipe <- starwars |>
  add_blur(species, blur=human_blur)
  
new_starwars <- apply_deident(starwars, species_pipe)

table(new_starwars$species)
```