---
author: "Joseph Larmarange"
title: "Generate a data dictionary and search for variables with `look_for()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Generate a data dictionary and search for variables with `look_for()`}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

## Showing a summary of a data frame

### Default printing of tibbles

It is a common need to easily get a description of all variables in a data frame.

When a data frame is converted into a tibble (e.g. with `dplyr::as_tibble()`), it as a nice printing showing the first rows of the data frame as well as the type of column.

```{r message=FALSE}
library(dplyr)
```


```{r}
iris %>% as_tibble()
```

However, when you have too many variables, all of them cannot be printed and their are just listed.

```{r}
data(fertility, package = "questionr")
women
```

Note: in **R** console, value labels (if defined) are usually printed but they do not appear in a R markdown document like this vignette.

### `dplyr::glimpse()`
 
 The function `dplyr::glimpse()` allows you to have a quick look at all the variables in a data frame.
 
```{r}
glimpse(iris)
glimpse(women)
```
 
It will show you the first values of each variable as well as the type of each variable. However, some important informations are not displayed:

- variable labels, when defined;
- value labels for labelled vectors;
- the list of levels for factors;
- the range of values for numerical variables.

### `labelled::look_for()`

`look_for()` provided by the `labelled` package will print in the console a data dictionary of all variables, showing variable labels when available, the type of variable and a list of values corresponding to:
 
- levels for factors;
- value labels for labelled vectors;
- the range of observed values in the vector otherwise (if `details = "full"`).


```{r}
library(labelled)
look_for(iris)
look_for(women)
```

Note that `lookfor()` and `generate_dictionary()` are synonyms of `look_for()` and works exactly in the same way.

If there is not enough space to print full labels in the console, they will be truncated (truncation is indicated by a `~`).

## Searching variables by key

When a data frame has dozens or even hundreds of variables, it could become difficult to find a specific variable. In such case, you can provide an optional list of keywords, which can be simple character strings or regular expression, to search for specific variables.

```{r}
# Look for a single keyword.
look_for(iris, "petal")
look_for(iris, "s")

# Look for with a regular expression
look_for(iris, "petal|species")
look_for(iris, "s$")

# Look for with several keywords
look_for(iris, "pet", "sp")

# Look_for will take variable labels into account
look_for(women, "read", "level")
```

By default, `look_for()` will look through both variable names and variables labels. Use `labels = FALSE` to look only through variable names.

```{r}
look_for(women, "read")
look_for(women, "read", labels = FALSE)
```
Similarly, the search is by default case insensitive. To make the search case sensitive, use `ignore.case = FALSE`.

```{r}
look_for(iris, "sepal")
look_for(iris, "sepal", ignore.case = FALSE)
```

## Level of details


If you just want to use the search feature of `look_for()` without computing the details of each variable, simply indicate `details = "none"` or `details = FALSE`.

```{r}
look_for(women, "id", details = "none")
```

If you want more details (but can be time consuming for big data frames), indicate `details = "full"` or `details = TRUE`.


```{r}
look_for(women, details = "full")
look_for(women, details = "full") %>%
  dplyr::glimpse()
```


## Advanced usages of `look_for()`

`look_for()` returns a detailed tibble which is summarized before printing. To deactivate default printing and see full results, simply use `dplyr::as_tibble()`, `dplyr::glimpse()` or even `utils::View()`.

```{r, eval=FALSE}
look_for(women) %>% View()
```


```{r}
look_for(women) %>% as_tibble()
glimpse(look_for(women))
```

The tibble returned by `look_for()` could be easily manipulated for advanced programming.

When a column has several values for one variable (e.g. `levels` or `value_labels`), results as stored with nested named list. You can convert named lists into simpler character vectors, you can use `convert_list_columns_to_character()`.

```{r}
look_for(women) %>% convert_list_columns_to_character()
```

Alternatively, you can use `lookfor_to_long_format()` to transform results into a long format with one row per factor level and per value label.

```{r}
look_for(women) %>% lookfor_to_long_format()
```

Both can be combined:

```{r}
look_for(women) %>%
  lookfor_to_long_format() %>%
  convert_list_columns_to_character()
```