---
title: "Using DatabaseConnector through DBI and dbplyr"
author: "Martijn J. Schuemie"
date: "`r Sys.Date()`"
output:
  pdf_document:
    number_sections: yes
    toc: yes
  html_document:
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Using DatabaseConnector through DBI and dbplyr}
  %\VignetteEncoding{UTF-8}    
---

```{r, echo = FALSE, message = FALSE}
library(DatabaseConnector)
```

# Introduction

This vignette describes how to use the `DatabaseConnector` package through the `DBI` and `dbplyr` interfaces. It assumes you already know how to create a connection as described in the 'Connecting to a database' vignette.

All functions of the `DatabaseConnector` `DBI` interface apply SQL translation, thus making it an interface to a virtual database platform speaking OHDSISql as defined in `SqlRender`.

# Connecting

We can use the `dbConnect()` function, which is equivalent to the `connect()` function:

```{r eval=FALSE}
connection <- dbConnect(
  DatabaseConnectorDriver(), 
  dbms = "postgresql",
  server = "localhost/postgres",
  user = "joe",
  password = "secret"
)
```
```{r echo=FALSE}
writeLines("Connecting using PostgreSQL driver")
```
```{r eval=FALSE}
dbIsValid(conn)
```
```{r echo=FALSE}
TRUE
```

# Querying 

Querying and executing SQL can be done through the usual `DBI` functions. SQL statements are assumed to be written in 'OhdsiSql', a subset of SQL Server SQL (see the `SqlRender` package for details), and are automatically translated to the appropriate SQL dialect. For example:

```{r eval=FALSE}
dbGetQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
```
```{r echo=FALSE}
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))
```

Or:

```{r eval=FALSE}
res <- dbSendQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
dbFetch(res)
```
```{r echo=FALSE}
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))
```
```{r eval=FALSE}
dbHasCompleted(res)
```
```{r echo=FALSE}
TRUE
```
```{r eval=FALSE}
dbClearResult(res)
dbDisconnect(res)
```

# Using dbplyr

We can create a table based on a `DatabaseConnector` connection. The `inDatabaseSchema()` function allows us to use standard `databaseSchema` notation promoted by SqlRender:

```{r eval=FALSE}
library(dpylr)
person <- tbl(connection, inDatabaseSchema("cdmv5", "person"))
person
```
```{r echo=FALSE}
data.frame(person_id = c(1,2,3), gender_concept_id = c(8507, 8507, 8507), year_of_birth = c(1975, 1976, 1977))
```

we can apply the usual `dplyr` syntax:

```{r eval=FALSE}
person %>%
  filter(gender_concept_id == 8507) %>%
  count() %>%
  pull()
```
```{r echo=FALSE}
1234
```

## Date functions

The `dbplyr` package does not support date functions, but `DatabaseConnector` provides the `dateDiff()`, `dateAdd()`, `eoMonth()`, `dateFromParts()`, `year()`, `month()`, and `day()` functions that will correctly translate to various data platforms:

```{r eval=FALSE}
observationPeriod <- tbl(connection, inDatabaseSchema("cdmv5", "observation_period"))
observationPeriod %>%
  filter(
    dateDiff("day", observation_period_start_date, observation_period_end_date) > 365
  ) %>%
  count() %>%
  pull()
```
```{r echo=FALSE}
987
```

## Allowed table and field names in dbplyr

Because of the many idiosyncrasies in how different dataplatforms store and transform table and field names, it is currently not possible to use any names that would require quotes. So for example the names `person`, `person_id`, and `observation_period` are fine, but `Person ID` and `Obs. Period` are not. In general, it is highly recommend to **use lower case snake_case for database table and field names**.


# Temp tables

The `DBI` interface uses temp table emulation on those platforms that do not support real temp tables. This does require that for those platforms the user specify a `tempEmulationSchema`, preferably using 
```{r eval=FALSE}
option(sqlRenderTempEmulationSchema = "a_schema")
```

Where `"a_schema"` refers to a schema where the user has write access. If we know we will need temp tables, we can use the `assertTempEmulationSchemaSet()` to verify this option has been set. This function will throw an error if it is not set, but only if the provided dbms is a platform that requires temp table emulation.

In `OHDSISql`, temp tables are referred to using a '#' prefix. For example:

```{r eval=FALSE}
dbWriteTable(connection, "#temp", cars)
```
```{r echo=FALSE}
message("Inserting data took 0.053 secs")
```

## Temp tables in dbplyr

The `copy_to` function creates a temp table:

```{r eval=FALSE}
carsTable <- copy_to(connection, cars)
```
```{r echo=FALSE}
writeLines("Created a temporary table named #cars")
```

The `compute()` function also creates a temp table, for example:

```{r eval=FALSE}
tempTable <- person %>%
  filter(gender_concept_id == 8507) %>%
  compute()
```
```{r echo=FALSE}
message("Created a temporary table named #dbplyr_001")
```

## Cleaning up emulated temp tables

Emulated temp tables are not really temporary, and therefore have to be removed when no longer needed. A convenient way to drop all emulated temp tables created so far in an R session is using the `dropEmulatedTempTables()` function:

```{r eval=FALSE}
dropEmulatedTempTables(connection)
```

In our example, this does not do anything because were using a PostgreSQL server, which does natively support temp tables.

# Closing the connection

We can use the `dbDisconnect()` function, which is equivalent to the `disconnect()` function:

```{r eval=FALSE}
dbDisconnect(connection)
```