---
title: "Basic Workflow"
output: 
  rmarkdown::html_vignette:
    keep_md: true
    toc: true
vignette: >
  %\VignetteIndexEntry{basic_workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

## Objectives

-   Collect drug and adverse drug reaction **IDs**

-   Perform standard data management in VigiBase ECL

-   Conduct disproportionality analysis (both univariate and multivariate)

If you want to access script templates for these steps, see

-   `vignette("template_dictionary")` and 

-   `vignette("template_main")`

## Reminder of Database Structure

Each table has a unique identifying key and other keys to perform joins.

| Table  | Key           | Other keys |
|--------|---------------|------------------|
| `demo` | `UMCReportId` |                  |
| `drug` | `Drug_Id`     | `UMCReportId`    |
| `adr`  | `Adr_Id`      | `UMCReportId`    |

> Goal of the tutorial: perform a disproportionality analysis between colitis and nivolumab among checkpoint inhibitor cases.

## Step 0: Load Packages

```{r library, warning=FALSE, message=FALSE}
library(vigicaen)
library(rlang)
library(dplyr)
```

# Build tables from source files {#building_tables}

This process should be done **once per database version**.

You don't have to do it to follow this tutorial, since
we will use the package built-in example tables.

However, when working on real analyses, you will need
to process this step first.

See the vignette here `vignette("getting_started")`.

# Collecting IDs {#collecting_ids}

The whole package relies on defining a dictionary of drugs
and adrs of interest.

Those collection of terms should be stored in **named lists**.

```{r named_lists}
# drug selection
d_sel <-
  list(
    nivolumab   = "nivolumab",
    ipilimumab  = "ipilimumab",
    nivo_or_ipi = c("nivolumab", "ipilimumab")
  )

# adverse drug reaction selection

a_sel <- 
  list(
    colitis     = "Colitis",
    pneumonitis = "Pneumonitis"
  )
```

As we see, the `d_sel` list contains three named vectors: nivolumab, ipilimumab, and nivo_or_ipi. Each of these vectors can contain one or
more names of drugs.

> The names of the vectors don't have to be the same as the drug names.
> They will be used to created columns later in the process.

We will pass these named list selections to *ID collector* functions:
The **`get_*`** family.

-   `get_drecno()` for drugs, `get_atc_code()` for ATC classes

-   `get_llt_soc()` or `get_llt_smq()` for adverse drug reactions

These functions will allow to collect IDs (e.g. codes) matching our drugs
and adrs in a specific dictionary. 

-    For drugs, we will use the WHODrug dictionary, and collect Drug Record Numbers
(DrecNos) most of the time, or Medicinal Product Ids (MedicinalProd_Ids) in some
specific scenarii.

-   For adrs, we will use the Medical Dictionary for Regulatory Activities (MedDRA),
and collect term codes (low-level term codes). Here, it is 
important to note that we can work with other terms (like Preferred Terms, High
Level terms, etc.). The ID collector will just collect low-level term codes of all
higher level terms, resulting in a pretty long list of codes, because VigiBase 
data is structured on low-level term codes.

# Data management

## Drugs {#drug_workflow}

### Principle

1.  Load the `demo`, `drug`, and `mp` tables.
2.  Select one or more medications of interest.
3.  Identify the drug codes (e.g., the `DrecNo(s)`)
associated with these drug using the `mp` table.
4.  Search for cases exposed to these medications using the codes in the `drug` table.
5.  Update the `demo` table: code 1 if the case reports the medication of interest, 0 otherwise.
6.  Check your data management

Step 1 is performed with `dt_parquet()` if you have [your own tables](#building_tables), or using the built-in example tables for this tutorial.

Step 2 and 3 can be referred as "dictionary" steps.

Step 3 is performed with `get_drecno()` or `get_atc_code()`.

Steps 4 and 5 are performed with `add_drug()`.

step 6 is performed with `check_dm()`.

### Step 1: Load the tables

```{r table_loading, warning=FALSE, message=FALSE}
demo <- demo_
drug <- drug_

mp <- mp_
```

> Note: if you are working with your own tables, 
you will need to load them here, with `dt_parquet()`.

### Step 2: Choose drugs of Interest

This is probably the most interesting part of the process,
from a scientific point of view. But for now, it's pretty trivial.

Once you've decided which drugs you would like to study
(e.g. nivolumab in this tutorial), you need to create
a **named list** of character vectors.

```{r drug_sel}
d_sel <- # drug selection
  list2(
    nivolumab = "nivolumab"
  )

d_sel
```
> Remember to use lower case names in d_sel (no capital letters:
> Good: "nivolumab", Wrong: "Nivolumab" or "NIVOLUMAB")

The ideal way of picking drugs is by using their **WHO name**.

WHO name is an international non-proprietary name (INN) for the drug. A few drugs
have more than one INN (e.g. paracetamol and 
acetaminophen), but they still have a unique WHO name (most of the time, one
of the INN).

The ID collector `get_drecno()` will let you know if you missed the WHO name.

Alternatively, you can work with Anatomical and Therapeutical Classification (ATC)
codes to investigate a set of drug. 
This is explained in the [ATC section](#atc_classes).

### Step 3: Identify drug codes

The function `get_drecno()` allows you to query the `mp`
table with a selection of drug names.

It takes several arguments, two of which must be filled in: 
the selection of drugs and the table containing the
correspondence between the name and the code (here, `mp`).

You should **always** look carefully at the printed message.

1.  Use `get_drecno()` with argument `verbose = TRUE` (default).

```{r verbose_with_get_drecno}
get_drecno(
  d_sel = d_sel,
  mp = mp_,
  verbose = TRUE
  )
```

We see that `get_drecno()` finds two entries containing the drug "nivolumab" in the `mp` table:

-   The entry for nivolumab alone

-   The entry for the ipilimumab;nivolumab combination

This second entry was identified because the `allow_combination`
argument is set to `TRUE` by default. 
This allows for a broader identification of all specialties
containing nivolumab. In this situation, this behavior is
desirable because we want to be sure to identify all cases
reporting the drug.

2.  Since these are actually the codes we were looking for, 
we can (optionally) set the `verbose` argument to FALSE and keep the result 
in an R object called `d_drecno`.

```{r d_drecno}
d_drecno <-
  get_drecno(
    d_sel = d_sel,
    mp = mp_,
    verbose = FALSE
    )
```

### Steps 4 and 5: add_drug() function

To identify cases reporting the drug of interest and add the
corresponding column to `demo`, we use the `add_drug()` function.

The `add_drug()` function takes 3 mandatory arguments:

-   The dataset on which to add the drug variable(s) (here, `demo`)

-   A named list containing the codes of the drug(s)

-   The `drug` table linking drug intake to each case.

```{r add_drug}
demo <- 
  add_drug(
    .data = demo,
    d_code = d_drecno,
    drug_data = drug)
demo
```

Or, in tidyverse syntax

```{r, results='hide'}
demo <- 
  demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug
  )
```

### Step 6: Check your data management

This may seem trivial, but it is an **essential** step 
in the construction of a dataset.

There are many ways to check that the code has worked.
Here, the `check_dm()` function will count the number of rows
in the dataset where the desired column is equal to 1.

```{r check_dm}
check_dm(demo, "nivolumab")
```

It shows how many rows in `demo` have the value 1 in the
`nivolumab` column (e.g. how many cases where identified
as reporting on nivolumab reactions). 

Here, we see that 225 cases report nivolumab.

### Step 2 and 3 variant: ATC classes {#atc_classes}

The correspondence between ATC (Anatomical and Therapeutical
Classification) classes and drug codes is found in the `thg`
table. In this table, drug codes are stored as
`MedicinalProd_Id`. It is therefore necessary to make a second
correspondence with `mp` to find `DrecNo`.

This can be done with the `get_atc_code()` function.

As with drugs, we first need to identify the ATC class of interest (here, "L03").

```{r atc_drecno}
atc_sel <-
  list2(l03 = "L03")

atc_drecno <- 
  get_atc_code(
    atc_sel = atc_sel,
    mp = mp,
    thg_data = thg_
    )
```

The `get_atc_code()` function requires the `mp` 
and `thg` tables, as well as the selection of ATC classes.

```{r str_atc_drecno}
str(atc_drecno)
```

By default, this function retrieves DrecNos associated with 
an ATC class. It is possible to retrieve MedicinalProd_Ids
instead by setting the `vigilyze` argument to `FALSE`.

The interest of using MedicinalProd_Ids instead of DrecNos is to
restrict the drug panel only to packages corresponding to a 
specific ATC class (e.g., you might not want to find all packages
of corticosteroids if you work with the ATC class "S01BA", which 
corresponds to ophtalmic steroids).

Once DrecNos are identified, we can add them to the `demo` table,
with the `add_drug()` function.

```{r add_atc}
demo |> 
  add_drug(
    d_code = atc_drecno,
    drug_data = drug
  )
```

### Step 4 and 5 variant: Suspect, concomitant, interacting

We can choose to work with drugs according to their
"reputation basis".

This information is stored in the `Basis` column of the
`drug` table.

-   1 suspect

-   2 concomitant

-   3 interacting

By using the `add_drug()` function, we can specify which
type of status we are interested in, in the `repbasis` argument.
By default, the value `"sci"` indicates that we consider the
drug whether it is suspect, concomitant, or interacting. 
We can change the selection.

```{r add_drug_repbasis}
demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug,
    repbasis = "sci"
  ) |> 
  check_dm("nivolumab")

# suspected only

demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug,
    repbasis = "s"
  ) |> 
  check_dm("nivolumab")
```



### Create multiple drug columns

To work with multiple drugs, 
you need to update the initial `d_sel` list.

```{r many_drugs}
d_sel <- 
  list2(
    nivolumab = "nivolumab",
    pembrolizumab = "pembrolizumab"
  )

d_drecno <-
  d_sel |> 
  get_drecno(mp = mp)

demo <- 
  demo |> 
  add_drug(
    d_drecno,
    drug_data = drug
  )

demo |> 
  check_dm(c("nivolumab", "pembrolizumab"))
```

### Drug groups

If you want to work at the level of a group of drugs,
but the ATC classes do not match your needs perfectly,
you can group them in the `d_sel` list.

```{r d_groups}
d_sel <- 
  list2(
    analgesics = c("paracetamol", "tramadol"),
    ici = c("nivolumab", "pembrolizumab")
  )

d_drecno <-
  d_sel |> 
  get_drecno(mp = mp,
             allow_combination = FALSE)

demo <- 
  demo |> 
  add_drug(
    d_drecno,
    drug_data = drug
  )

demo |> 
  check_dm(names(d_sel))
```

## Adverse drug reactions

### Principles

1. Load the `demo`, `adr`, and `meddra` tables.

2. Choose the adverse event(s) of interest.

3. Identify the event codes (these are low-level terms according
to the MedDRA classification). They can be found in the `meddra`
table or in the `smq` tables.

4. Search for cases that have presented this event, using the
codes

5. Update the `demo` table: code 1 if the case reports the
event of interest, 0 otherwise.

6. Check your data management

Similarly to the [drug workflow](#drug_workflow), steps
2 and 3 can be referred to as "dictionary" steps.

Step 3 uses `get_llt_soc()` or `get_llt_smq()`.

### Step 1: Load the tables

```{r load_adr_table}
adr <- adr_
meddra <- meddra_
```

demo was loaded during the [drug workflow](#drug_workflow).

### Step 2: Choose events of interest

```{r a_sel_pt}
a_sel_pt <-
  list2(
    a_colitis = c(
      "Colitis",
      "Autoimmune colitis",
      "Colitis microscopic",
      "Diarrhoea",
      "Diarrhoea haemorrhagic",
      "Duodenitis",
      "Enteritis",
      "Enterocolitis",
      "Enterocolitis haemorrhagic",
      "Ulcerative gastritis"
    )
  )
```

We start with a list of adverse events of interest, grouped
altogether under the name "a_colitis".

> MedDRA terms always start with a capital letter, be sure to provide
> the exact case, e.g. Good : "**C**olitis", Wrong : "colitis" or "COLITIS".

Be sure all selected terms belong to the same hierarchical level
(preferred term, high level term...) in MedDRA.
Here, we use Preferred Terms.

### Step 3: Identify event codes

The `get_llt_soc()` function allows you to query the `meddra`.

```{r get_llt_soc}
a_llt <- 
  get_llt_soc(
    term_sel = a_sel_pt,
    term_level = "pt",
    meddra = meddra_
    )

a_llt
```

An alternative is to use the `get_llt_smq()` function, which
allows you to query the `smq` tables.

Notice that you collect low level term codes, even if you work with
higher level terms, like preferred terms, or high level terms.
This is intentional: this list collects all low level term codes composing
the higher level term. See [Collecting ID section](#collecting_ids).

### Steps 4 and 5: add_adr() function

The `add_adr()` function allows you to identify cases reporting
the adverse event of interest and add the corresponding column
to `demo`.

```{r add_adr}
demo <- 
  add_adr(
    .data = demo,
    a_code = a_llt,
    adr_data = adr)
```

### Step 6: Check your data management

`check_dm()` also works for adr.

```{r check_dm_adr}
demo |> 
  check_dm("a_colitis")
```

## Other variables

We may need to create other variables to perform our analysis,
for example age and sex in a multivariable analysis.

### Age

The `demo` table contains the `AgeGroup` column, which groups
ages into categories. You may want to recode it to match 
you research question

```{r age}
demo <-
  demo |>
  mutate(
    age = cut(as.integer(AgeGroup),
              breaks = c(0,4,5,6,7,8),
              include.lowest = TRUE, right = TRUE,
              labels = c("<18", "18-45","45-64", "65-74", "75+"))
  )
```

### Sex

The `demo` table contains the `Gender` column, from which
you can also create a new sex column (with values 1 for
men, 2 for women, and NA otherwise)

```{r sex}
demo <-
  demo |> 
  mutate(
    sex = ifelse(Gender == "1", 1,
                 ifelse(Gender == "2", 2, NA_real_)
                 )
    )
```

### Using case_when()

The `case_when()` function from the `dplyr` package allows
you to manage multiple options in a single function, with
a slightly different syntax.


```{r sex_casewhen}
demo <- 
  demo |> 
  mutate(
    sex = case_when(Gender == "1" ~ 1,
                    Gender == "2" ~ 2,
                    TRUE ~ NA_real_)
    )
```

More documentation on `case_when()` can be found in the
`dplyr` package documentation.

You should just remember here that options are evaluated
sequentially, from top to bottom.


### Seriousness, death

The `out` table contains the `Seriousness` column, which
indicates whether the case was serious or not, and whether
the patient experienced a fatal issue during his/her follow-up.

```{r serious_death}
# ---- Serious ---- ####

out <- out_

demo <- 
  demo |> 
  mutate(
    serious = 
      ifelse(
        UMCReportId %in% out$UMCReportId,
        UMCReportId %in% 
          (out |> 
          filter(Serious == "Y") |> 
          pull(UMCReportId)
          ),
        NA)
    )

# ---- Death + outcome availability ---- ####

demo <- 
  demo |> 
  mutate(death = 
           ifelse(UMCReportId %in% out$UMCReportId,
                  UMCReportId %in% 
                    (out |> 
                    filter(Seriousness == "1") |> 
                    pull(UMCReportId)
                    ),
                  NA)
         )

```

-  The `serious` and `death` columns are coded with TRUE/FALSE values in
this example. There is no particular reason to prefer it over 1/0
codes. It is just a matter of preference.

-  The `Seriousness` can have several levels, level 1 being death.
(see subsidiary files)

# Disproportionality

Our `demo` dataset now has a drug column for nivolumab, 
and an adr column for colitis.

We can perform a disproportionality analysis between these
two variables.

## Univariate analysis

### Disproportionality metrics

Reporting Odds-Ratio (ROR) and Information Component
essentially measure the same thing: the disproportionality.

`compute_dispro()` computes both of these.

-    `or` and `or_ci` are the reporting Odds-Ratio and its confidence interval
(default: 95%CI).

-    `ic` and `ic_tail` are the Information Component, 
and its lower end of credibility interval (default: IC025).

```{r compute_dispro}
demo |> 
  compute_dispro(
    y = "a_colitis",
    x = "nivolumab"
    )
```

## Advanced modelling, multivariate analysis

From this point, it is also possible to run any statistical model
including drug and adr parameters, but also potential other variables
such as age and sex. For example, one could wish to perform
a multivariate logistic regression on the reporting of colitis
and nivolumab, adjusted on age groups and sex.

The `glm()` function from the `stats` package can be used for this
purpose.


```{r mod}
mod <- glm(a_colitis ~ nivolumab, 
           data = demo, family = "binomial")

summary(mod)
```

In a logistic regression models, estimates lead to (reporting) OR by the exponential.

```{r}
summary(mod)$coefficients

exp(summary(mod)$coefficients[2, 1])
```

Adding covariates is straightforward

```{r mod_covar}
mod2 <- glm(a_colitis ~ nivolumab + sex + age,
            data = demo,
            family = "binomial")

summary(mod2)
```

### Extract Odds-Ratio with compute_or_mod()

There are several packages that can extract the OR from a model.
The `compute_or_mod()` function is just one of many ways to do it.

```{r compute_or_mod}
mod_or <- 
  compute_or_mod(
    summary(mod2)$coefficients,
    estimate = Estimate,
    std_er = Std..Error
    )

mod_or
```

# Conclusion

You're now all set to create drugs and adrs columns into a
`demo` dataset. This is the first step to many modelling possibilities!

Where do you want to go next?

-    Dive into descriptive features, such as time to onset, 
dechallenge, rechallenge, screening of drugs and adrs.
`vignette("descriptive")`

-   Learn on interactions in pharmacovigilance database
`vignette("interactions")`