---
title: "Compare, subset or stratify codelists"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a06_CreateSubsetsFromCodelist}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = NOT_CRAN)
```

```{r, include = FALSE}
CDMConnector::requireEunomia("synpuf-1k", "5.3")
```

## Introduction: Generate codelist subsets, exploring codelist utility functions
This vignette introduces a set of functions designed to manipulate and explore codelists within an OMOP CDM. Specifically, we will learn how to:

-   **Subset a codelist** to keep only codes meeting a certain criteria.
-   **Stratify a codelist** based on attributes like dose unit or route of administration.
-   **Compare two codelists** to identify shared and unique concepts.

First of all, we will load the required packages and connect to a mock database.
```{r, warning=FALSE, message=FALSE}
library(DBI)
library(duckdb)
library(dplyr)
library(CDMConnector)
library(CodelistGenerator)

# Connect to the database and create the cdm object
con <- dbConnect(duckdb(), 
                      eunomiaDir("synpuf-1k", "5.3"))
cdm <- cdmFromCon(con = con, 
                  cdmName = "Eunomia Synpuf",
                  cdmSchema   = "main",
                  writeSchema = "main", 
                  achillesSchema = "main")
```

We will start by generating a codelist for *acetaminophen* using `getDrugIngredientCodes()`
```{r, warning=FALSE, message=FALSE}
acetaminophen <- getDrugIngredientCodes(cdm,
                                        name = "acetaminophen",
                                        nameStyle = "{concept_name}",
                                        type = "codelist")
```

### Subsetting a Codelist
Subsetting a codelist will allow us to reduce a codelist to only those concepts that meet certain conditions.

#### Subset to Codes in Use
This function keeps only those codes observed in the database with at least a specified frequency (`minimumCount`) and in the table specified (`table`). Note that this function depends on ACHILLES tables being available in your CDM object.

```{r}
acetaminophen_in_use <- subsetToCodesInUse(x = acetaminophen, 
                                           cdm, 
                                           minimumCount = 0,
                                           table = "drug_exposure")
acetaminophen_in_use # Only the first 5 concepts will be shown
```
#### Subset by Domain
We will now subset to those concepts that have `domain = "Drug"`. Remember that, to see the domains available in the cdm, you can use `getDomains(cdm)`.
```{r, warning=FALSE, messages=FALSE}
acetaminophen_drug <- subsetOnDomain(acetaminophen_in_use, cdm, domain = "Drug")

acetaminophen_drug
```
We can use the `negate` argument to exclude concepts with a certain domain:

```{r, warning=FALSE, messages=FALSE}
acetaminophen_no_drug <- subsetOnDomain(acetaminophen_in_use, cdm, domain = "Drug", negate = TRUE)

acetaminophen_no_drug
```
#### Subset on Dose Unit
We will now filter to only include concepts with specified dose units. Remember that you can use `getDoseUnit(cdm)` to explore the dose units available in your cdm.
```{r, warning=FALSE, messages=FALSE}
acetaminophen_mg_unit <- subsetOnDoseUnit(acetaminophen_drug, cdm, c("milligram", "unit"))
acetaminophen_mg_unit
```
As before, we can use argument `negate = TRUE` to exclude instead.

#### Subset on route category
We will now subset to those concepts that do not have an "unclassified_route" or "transmucosal_rectal":
```{r, warning=FALSE, messages=FALSE}
acetaminophen_route <- subsetOnRouteCategory(acetaminophen_mg_unit, 
                                             cdm, c("transmucosal_rectal","unclassified_route"), 
                                             negate = TRUE)
acetaminophen_route
```
### Stratify codelist
Instead of filtering, stratification allows us to split a codelist into subgroups based on defined vocabulary properties.

#### Stratify by Dose Unit
```{r, warning=FALSE, messages=FALSE}
acetaminophen_doses <- stratifyByDoseUnit(acetaminophen, cdm, keepOriginal = TRUE)

acetaminophen_doses
```
#### Stratify by Route Category
```{r, warning=FALSE, messages=FALSE}
acetaminophen_routes <- stratifyByRouteCategory(acetaminophen, cdm)

acetaminophen_routes
```

### Compare codelists
Now we will compare two codelists to identify overlapping and unique codes. 
```{r, warning=FALSE, messages=FALSE}
acetaminophen <- getDrugIngredientCodes(cdm, 
                                           name = "acetaminophen", 
                                           nameStyle = "{concept_name}",
                                           type = "codelist_with_details")
hydrocodone <- getDrugIngredientCodes(cdm, 
                                      name = "hydrocodone", 
                                      doseUnit = "milligram", 
                                      nameStyle = "{concept_name}",
                                      type = "codelist_with_details")
```
Compare the two sets:
```{r}
comparison <- compareCodelists(acetaminophen$acetaminophen, hydrocodone$hydrocodone)

comparison |> glimpse()

comparison |> filter(codelist == "Both")
```