---
title: "Class dm and basic operations"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Technical: Class dm and basic operations}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---


``````{r setup, include = FALSE}
source("setup/setup.R")
``````

The goal of the {dm} package and the `dm` class that comes with it, is to make your life easier when you are dealing with data from several different tables.

Let's take a look at the `dm` class.

## Class `dm`

The `dm` class consists of a collection of tables and metadata about the tables, such as

- the names of the tables
- the names of the columns of the tables
- the primary and foreign keys of the tables to link the tables together
- the data (either as data frames or as references to database tables)

All tables in a `dm` must be obtained from the same data source;
csv files and spreadsheets would need to be imported to data frames in R.


## Examples of `dm` objects {#ex_dm}

There are currently three options available for creating a `dm` object.
The relevant functions for creating `dm` objects are:

1. `dm()`
2. `as_dm()`
3. `new_dm()`
4. `dm_from_con()`

To illustrate these options, we will now create the same `dm` in several different ways.
We can use the tables from the well-known {nycflights13} package.

### Pass the tables directly

Create a `dm` object directly by providing data frames to `dm()`:

```{r}
library(nycflights13)
library(dm)
dm(airlines, airports, flights, planes, weather)
```

### Start with an empty `dm`

Start with an empty `dm` object that has been created with `dm()` or `new_dm()`, and add tables to that object:

```{r}
library(nycflights13)
library(dm)
empty_dm <- dm()
empty_dm
dm(empty_dm, airlines, airports, flights, planes, weather)
```

### Coerce a list of tables

Turn a named list of tables into a `dm` with `as_dm()`:
```{r}
as_dm(list(
  airlines = airlines,
  airports = airports,
  flights = flights,
  planes = planes,
  weather = weather
))
```

### Turn tables from a `src` into a `dm`

Squeeze all (or a subset of) tables belonging to a `src` object into a `dm` using `dm_from_con()`:

```{r message=FALSE}
sqlite_con <- dbplyr::nycflights13_sqlite()

flights_dm <- dm_from_con(sqlite_con)
flights_dm
```


The function `dm_from_con(con, table_names = NULL)` includes all available tables on a source in the `dm` object.
This means that you can use this, for example, on a postgres database that you access via `DBI::dbConnect(RPostgres::Postgres())` (with the appropriate arguments `dbname`, `host`, `port`, ...), to produce a `dm` object with all the tables on the database.

### Low-level construction

Another way of creating a `dm` object is calling `new_dm()` on a list of `tbl` objects:

```{r}
base_dm <- new_dm(list(
  airlines = airlines,
  airports = airports,
  flights = flights,
  planes = planes,
  weather = weather
))
base_dm
```

This constructor is optimized for speed and does not perform integrity checks.
Use with caution, validate using `dm_validate()` if necessary.

```{r}
dm_validate(base_dm)
```

## Access tables

We can get the list of tables with `dm_get_tables()` and the `src` object with `dm_get_con()`.

In order to pull a specific table from a `dm`, use:

```{r}
flights_dm[["airports"]]
```

But how can we use {dm}-functions to manage the primary keys of the tables in a `dm` object?

## Primary keys of `dm` objects {#pk}

Some useful functions for managing primary key settings are:

1. `dm_add_pk()`
1. `dm_get_all_pks()`
1. `dm_rm_pk()`
1. `dm_enum_pk_candidates()`

If you created a `dm` object according to the examples in ["Examples of `dm` objects"](#ex_dm), your object does not yet have any primary keys set.
So let's add one.

We use the `nycflights13` tables, i.e. `flights_dm` from above.

```{r}
dm_has_pk(flights_dm, airports)
flights_dm_with_key <- dm_add_pk(flights_dm, airports, faa)
flights_dm_with_key
```

The `dm` now has a primary key:

```{r}
dm_has_pk(flights_dm_with_key, airports)
```

To get an overview over all tables with primary keys, use `dm_get_all_pks()`:

```{r}
dm_get_all_pks(flights_dm_with_key)
```

Remove a primary key:

```{r}
dm_rm_pk(flights_dm_with_key, airports) %>%
  dm_has_pk(airports)
```

If you still need to get to know your data better, and it is already available in the form of a `dm` object, you can use the  `dm_enum_pk_candidates()` function in order to get information about which columns of the table are unique keys:

```{r}
dm_enum_pk_candidates(flights_dm_with_key, airports)
```

The `flights` table does not have any one-column primary key candidates:

```{r}
dm_enum_pk_candidates(flights_dm_with_key, flights) %>% dplyr::count(candidate)
```

`dm_add_pk()` has a `check` argument.
If set to `TRUE`, the function checks if the column of the table given by the user is unique.
For performance reasons, the default is `check = FALSE`.
See also [dm_examine_constraints()] for checking all constraints in a `dm`.

```{r error = TRUE}
try(
  dm_add_pk(flights_dm, airports, tzone, check = TRUE)
)
```


## Foreign keys

Useful functions for managing foreign key relations include:

1. `dm_add_fk()`
1. `dm_get_all_fks()`
1. `dm_rm_fk()`
1. `dm_enum_fk_candidates()`

Now it gets (even more) interesting: we want to define relations between different tables.
With the `dm_add_fk()` function you can define which column of which table points to another table's column.

This is done by choosing a foreign key from one table that will point to a primary key of another table.
The primary key of the referred table must be set with `dm_add_pk()`.
`dm_add_fk()` will find the primary key column of the referenced table by itself and make the indicated column of the child table point to it.

```{r}
flights_dm_with_key %>% dm_add_fk(flights, origin, airports)
```

This will throw an error:

```{r error=TRUE}
try(
  flights_dm %>% dm_add_fk(flights, origin, airports)
)
```

Let's create a `dm` object with a foreign key relation to work with later on:

```{r}
flights_dm_with_fk <- dm_add_fk(flights_dm_with_key, flights, origin, airports)
```

What if we tried to add another foreign key relation from `flights` to `airports` to the object? Column `dest` might work, since it also contains airport codes:

```{r error=TRUE}
try(
  flights_dm_with_fk %>% dm_add_fk(flights, dest, airports, check = TRUE)
)
```

Checks are opt-in and executed only if `check = TRUE`.
You can still add a foreign key with the default `check = FALSE`.
See also `dm_examine_constraints()` for checking all constraints in a `dm`.

Get an overview of all foreign key relations with`dm_get_all_fks()`:

```{r}
dm_get_all_fks(dm_nycflights13(cycle = TRUE))
```

Remove foreign key relations with `dm_rm_fk()` (parameter `columns = NULL` means that all relations will be removed, with a message):

```{r error=TRUE}
try(
  flights_dm_with_fk %>%
    dm_rm_fk(table = flights, column = dest, ref_table = airports) %>%
    dm_get_all_fks(c(flights, airports))
)

flights_dm_with_fk %>%
  dm_rm_fk(flights, origin, airports) %>%
  dm_get_all_fks(c(flights, airports))

flights_dm_with_fk %>%
  dm_rm_fk(flights, columns = NULL, airports) %>%
  dm_get_all_fks(c(flights, airports))
```

Since the primary keys are defined in the `dm` object, you do not usually need to provide the referenced column name of `ref_table`.

Another function for getting to know your data better (cf. `dm_enum_pk_candidates()` in ["Primary keys of `dm` objects"](#pk)) is `dm_enum_fk_candidates()`. Use it to get an overview over foreign key candidates that point from one table to another:

```{r}
dm_enum_fk_candidates(flights_dm_with_key, weather, airports)
```