---
title: "Micro files"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Micro files}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Use case

`px_micro()` exists to support a specific use case for [Statistics Greenland](https://stat.gl/default.asp?lang=en). 

They use it to create small PX-files to showcase and present metadata from a lager data set which cannot be made publicly available. See an example on [Statistic Greenland's microdata for Research and Analysis](https://bank.stat.gl/pxweb/en/GSmicro/).

## `px_micro()`

Apart from `px_save()`, `px_micro()` is the only other function that can save px objects as PX-files.

`px_micro()` turns a px object into many smaller PX-files, each containing a subset of the variables in the original px object.

## Input data

The basis of micro files are usually a data set which doesn't have a count variable (like most PX-files). `px_micro()` will instead create a count of each individual variable.

In this example we will use the built-in data  data set `greenlanders`.

```{r, include = FALSE}
set.seed(0)
micro_dir <- file.path("micro_files")
unlink(micro_dir, recursive = TRUE)
```

```{r}
library(pxmake)

greenlanders |> dplyr::sample_n(10) |> dplyr::arrange_all()
```

## How to create micro files

Create a px object with `px()`, and pass it to `px_micro()`.

```{r}
# Create px object
x <- px(greenlanders)

# Create folder for micro files
micro_dir <- file.path("micro_files")
dir.create(micro_dir)

# Write micro files to folder
px_micro(x, out_dir = micro_dir)
```
The folder now contains three PX-files, one for each variable except 'age'.
```{r}
list.files(micro_dir)
```

The reason 'age' didn't get a PX-file is because it is the HEADING variable in `x`, and `px_micro()` creates a file for each non-HEADING variable. Instead the HEADNING variable is used in all the created PX-files.

```{r}
# Print HEADING variables
px_heading(x)

# Print non-HEADING variables
c(px_stub(x), px_figures(x))
```

In this case, we want 'cohort' to be heading, and to create a PX-file for 'gender', 'age' and 'municipality'.

```{r}
x2 <-
  x |>
  px_stub('age') |>    # Change age to STUB
  px_heading('cohort') # Change cohort to HEADING
```

```{r}
# Clear folder
unlink(file.path(micro_dir, "*.px"))

px_micro(x2, out_dir = micro_dir)
```

The folder now contains the files we wanted.

```{r}
list.files(micro_dir)
```

Each file contains one of the three variables as STUB, 'cohort' as HEADING, and a variable 'n' which is the count of each combination of the variables.

```{r}
px(file.path(micro_dir, 'age.px'))$data

px(file.path(micro_dir, 'gender.px'))$data

px(file.path(micro_dir, 'municipality.px'))$data
```

## Keyword values

In general the keyword values from the px object are carried over to the micro files. This is the case for keywords like 'MATRIX', 'SUBJECT-CODE', 'CONTACT', 'LANGUAGE', 'CODEPAGE', etc.

To change keywords across all the micro files, the easiest is to change them in the px object before calling `px_micro()`.

```{r, eval = FALSE}
# Change CONTACT in all micro files
x2 |>
  px_contact("Johan Ejstrud") |>
  px_micro(out_dir = micro_dir)
```

However, some keywords need to be changed individually for each micro file. To do so, create a data frame with the column 'variable' and a column for each px keyword to change.

```{r}
individual_keywords <- tibble::tribble(~variable     ,      ~px_description,
                                       "age"         ,    "Age count 18-99",
                                       "gender"      ,       "Gender count",
                                       "municipality",  "Municipality 2024"
                                       )
```

Supply this dataframe to the `keyword_values` argument of `px_micro()`.

```{r}
px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords)
```

DESCRIPTION is changed in the micro files: 

```{r}
px(file.path(micro_dir, 'age.px')) %>% px_description()
px(file.path(micro_dir, 'gender.px')) %>% px_description()
px(file.path(micro_dir, 'municipality.px')) %>% px_description()
```


### Multilingual files

For multilingual files add a 'language' column to `keyword_values`.

```{r}
x3 <-
  x2 |>
  px_language("en") |>
  px_languages(c("en", "kl"))


individual_keywords_ml <- 
  tibble::tribble(
       ~variable, ~language,     ~px_description, ~px_matrix,
           "age",      "en",   "Age count 18-99",      "AGE",
           "age",      "kl",       "Ukiut 18-99",         NA,
        "gender",      "en",      "Gender count",      "GEN",
        "gender",      "kl",      " Suiaassuseq",         NA,
  "municipality",      "en", "Municipality 2024",      "MUN",
  "municipality",      "kl",      "Kommuni 2024",         NA
  )

px_micro(x3, out_dir = micro_dir, keyword_values = individual_keywords_ml)
```

Here 'px_description' varies for each language, and 'px_matrix' is only set for one of the languages, since it is not a language dependent keywords. For language independant keywords it doesn't matter which language the value is set for.

### Filenames

The filenames of the micro files are by default the name of the variable, however these can also be changed by passing a 'filename' column to 'keyword_values'

```{r}
individual_keywords2 <- 
  individual_keywords |>
  dplyr::mutate(filename = paste0(variable, "_2024", ".px"))

# Clear folder
unlink(file.path(micro_dir, "*.px"))

px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords2)

list.files(micro_dir)
```