---
title: "CDMConnector and dbplyr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{CDMConnector and dbplyr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia"))
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
if (!eunomiaIsAvailable()) downloadEunomiaData()

knitr::opts_chunk$set(
  collapse = TRUE,
  eval = rlang::is_installed("duckdb"),
  comment = "#>"
)
```

## Set up

First let's load the required packages for the code in this vignette. If you haven't already installed them, all the other packages can be installed using ´install.packages()´ 

```{r}
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
```

## Creating the cdm reference

Now let´s connect to a duckdb database with the Eunomia data (https://github.com/OHDSI/Eunomia).

```{r}
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(con, cdmName = "eunomia", cdmSchema = "main", writeSchema = "main")
cdm
```


This cdm object is now what we´ll use going forward. It provides a reference to the OMOP CDM tables. We can see that these tables are still in the database, but now we have a reference to each of the ones we might want to use in our analysis. For example, the person table can be referenced like so


## Putting it all together

Say we want to make a histogram of year of birth in the person table. We can select that variable, bring it into memory, and then use ggplot to make the histogram.

```{r, message=FALSE}
cdm$person %>%
  select(year_of_birth) %>%
  collect() %>%
  ggplot(aes(x = year_of_birth)) +
  geom_histogram(bins = 30)
```

If we wanted to make a boxplot for length of observation periods we could do the computation on the database side, bring in the new variable into memory, and use ggplot to produce the boxplot

```{r}
cdm$observation_period %>%
  select(observation_period_start_date, observation_period_end_date) %>%
  mutate(observation_period = (observation_period_end_date - observation_period_start_date)/365, 25) %>%
  select(observation_period) %>%
  collect() %>%
  ggplot(aes(x = observation_period)) +
  geom_boxplot()
```


## Behind the scenes
We use show_query to check the sql that is being run against duckdb
```{r}
cdm$person %>%
  tally() %>%
  show_query()
```


```{r}
cdm$person %>%
  summarise(median(year_of_birth))%>%
  show_query()
```

```{r, warning=FALSE}
cdm$person %>%
  mutate(gender = case_when(
    gender_concept_id == "8507" ~ "Male",
    gender_concept_id == "8532" ~ "Female",
    TRUE ~ NA_character_))%>%
  show_query()
```


```{r}
DBI::dbDisconnect(con, shutdown = TRUE)
```

<div style="margin-bottom:3cm;"></div>