---
title: "Creating a BDS Finding ADaM"
output: 
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Creating a BDS Finding ADaM}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(admiraldev)
```

# Introduction

This article describes creating a BDS finding ADaM. Examples are currently 
presented and tested in the context of ADVS.  However, the examples could be
applied to other BDS Finding ADaMs such as ADEG, ADLB, etc. where a single 
result is captured in an SDTM Finding domain on a single date and/or time.

**Note**: *All examples assume CDISC SDTM and/or ADaM format as input unless 
otherwise specified.*

# Programming Workflow

* [Read in Data](#readdata)
* [Derive/Impute Numeric Date/Time and Analysis Day (`ADT`, `ADTM`, `ADY`, `ADTF`, `ATMF`)](#datetime)
* [Assign `PARAMCD`, `PARAM`, `PARAMN`, `PARCAT1`](#paramcd)
* [Derive Results (`AVAL`, `AVALC`)](#aval)
* [Derive Additional Parameters (e.g. `BSA`, `BMI`, or `MAP` for `ADVS`)](#derive_param)
* [Derive Timing Variables (e.g. `APHASE`, `AVISIT`, `APERIOD`)](#timing)
* [Timing Flag Variables (e.g. `ONTRTFL`)](#timingflag)
* [Assign Reference Range Indicator (`ANRIND`)](#referencerange)
* [Derive Baseline (`BASETYPE`, `ABLFL`, `BASE`, `BASEC`, `BNRIND`)](#baseline)
* [Derive Change from Baseline (`CHG`, `PCHG`)](#bchange)
* [Derive Shift (e.g.`SHIFT1`)](#shift)
* [Derive Analysis Ratio (e.g. `R2BASE`)](#analysisratio)
* [Derive Analysis Flags (e.g. `ANL01FL`)](#analysisrec)
* [Assign Treatment (`TRTA`, `TRTP`)](#treatment)
* [Assign `ASEQ`](#aseq)
* [Derive Categorization Variables (`AVALCATy`)](#cat)
* [Derive Criterion Variables (`CRITy`, `CRITyFL`, `CRITyFN`)](#crit_vars)
* [Add ADSL variables](#adsl_vars)
* [Derive New Rows](#additional)
* [Add Labels and Attributes](#attributes)

## Read in Data {#readdata}

To start, all data frames needed for the creation of `ADVS` should be read into
the environment.  This will be a company specific process.  Some of the 
data frames needed may be `VS` and `ADSL`.

For example purpose, the CDISC Pilot SDTM and ADaM datasets---which are included in `{pharmaversesdtm}`---are used.

```{r message=FALSE}
library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(lubridate)
library(stringr)
library(tibble)

vs <- pharmaversesdtm::vs
adsl <- admiral::admiral_adsl

vs <- convert_blanks_to_na(vs)
```
```{r echo=FALSE}
vs <- filter(vs, USUBJID %in% c("01-701-1015", "01-701-1023", "01-703-1086", "01-703-1096", "01-707-1037", "01-716-1024"))
```

At this step, it may be useful to join `ADSL` to your `VS` domain. Only the 
`ADSL` variables used for derivations are selected at this step. The rest of the
relevant `ADSL` variables would be added later.

```{r eval=TRUE}
adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P)

advs <- derive_vars_merged(
  vs,
  dataset_add = adsl,
  new_vars = adsl_vars,
  by_vars = exprs(STUDYID, USUBJID)
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VSTESTCD, VSDTC, VISIT, TRTSDT, TRTEDT, TRT01A, TRT01P),
  filter = VSTESTCD == "DIABP" & VISIT == "WEEK 2"
)
```


## Derive/Impute Numeric Date/Time and Analysis Day (`ADT`, `ADTM`, `ADY`, `ADTF`, `ATMF`) {#datetime}

The function `derive_vars_dt()` can be used to derive `ADT`. This function allows 
the user to impute the date as well.

Example calls:

```{r eval=TRUE}
advs <- derive_vars_dt(advs, new_vars_prefix = "A", dtc = VSDTC)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VISIT, VSDTC, ADT),
  filter = VSTESTCD == "DIABP"
)
```

If imputation is needed and the date is to be imputed to the first of the month, 
the call would be:

```{r eval=TRUE, include=FALSE}
advs_old <- advs

advs <- advs %>%
  mutate(
    VSDTC = if_else(
      USUBJID == "01-716-1024" & VISIT == "SCREENING 1",
      "2012-07",
      VSDTC
    )
  ) %>%
  select(-ADT)
```

```{r eval=TRUE}
advs <- derive_vars_dt(
  advs,
  new_vars_prefix = "A",
  dtc = VSDTC,
  highest_imputation = "M"
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VISIT, VSDTC, ADT, ADTF),
  filter = USUBJID == "01-716-1024"
)
```


```{r eval=TRUE, echo=FALSE}
advs <- advs_old
```

Similarly, `ADTM` may be created using the function `derive_vars_dtm()`. 
Imputation may be done on both the date and time components of `ADTM`.

```{r eval=FALSE}
# CDISC Pilot data does not contain times and the output of the derivation
# ADTM is not presented.
advs <- derive_vars_dtm(
  advs,
  new_vars_prefix = "A",
  dtc = VSDTC,
  highest_imputation = "M"
)
```

By default, the variable `ADTF` for `derive_vars_dt()` or `ADTF` and `ATMF` for 
`derive_vars_dtm()` will be created and populated with the controlled
terminology outlined in the ADaM IG for date imputations. 

See also [Date and Time Imputation](imputation.html).

Once `ADT` is derived, the function `derive_vars_dy()` can be used to derive `ADY`.
This example assumes both `ADT` and `TRTSDT` exist on the data frame.

```{r eval=TRUE}
advs <-
  derive_vars_dy(advs, reference_date = TRTSDT, source_vars = exprs(ADT))
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VISIT, ADT, ADY, TRTSDT),
  filter = USUBJID == "01-716-1024"
)
```

## Assign `PARAMCD`, `PARAM`, `PARAMN`, `PARCAT1` {#paramcd}

To assign parameter level values such as `PARAMCD`, `PARAM`, `PARAMN`, `PARCAT1`,
etc., a lookup can be created to join to the source data.

For example, when creating `ADVS`, a lookup based on the SDTM `--TESTCD` value 
may be created:

`VSTESTCD` | `PARAMCD` | `PARAM` | `PARAMN` | `PARCAT1` | `PARCAT1N`
--------- | --------- | -------- | ------- | --------- | ----------
HEIGHT | HEIGHT | Height (cm) | 1 | Subject Characteristic | 1
WEIGHT | WEIGHT | Weight (kg) | 2 | Subject Characteristic | 1
DIABP | DIABP | Diastolic Blood Pressure (mmHg) | 3 | Vital Sign | 2
MAP | MAP | Mean Arterial Pressure | 4 | Vital Sign | 2
PULSE | PULSE | Pulse Rate (beats/min) | 5 | Vital Sign | 2
SYSBP | SYSBP | Systolic Blood Pressure (mmHg) | 6 | Vital Sign | 2
TEMP | TEMP | Temperature (C) | 7 | Vital Sign | 2

This lookup may now be joined to the source data:

```{r eval=TRUE, include=FALSE}
param_lookup <- tibble::tribble(
  ~VSTESTCD, ~PARAMCD,                            ~PARAM, ~PARAMN,                 ~PARCAT1, ~PARCAT1N,
  "HEIGHT",  "HEIGHT",                     "Height (cm)",       1, "Subject Characteristic",         1,
  "WEIGHT",  "WEIGHT",                     "Weight (kg)",       2, "Subject Characteristic",         1,
  "DIABP",    "DIABP", "Diastolic Blood Pressure (mmHg)",       3,             "Vital Sign",         2,
  "MAP",        "MAP",   "Mean Arterial Pressure (mmHg)",       4,             "Vital Sign",         2,
  "BSA",        "BSA",         "Body Surface Area (m^2)",       5,             "Vital Sign",         2,
  "PULSE",    "PULSE",          "Pulse Rate (beats/min)",       6,             "Vital Sign",         2,
  "SYSBP",    "SYSBP",  "Systolic Blood Pressure (mmHg)",       7,             "Vital Sign",         2,
  "TEMP",      "TEMP",                 "Temperature (C)",       8,             "Vital Sign",         2
)
attr(param_lookup$VSTESTCD, "label") <- "Vital Signs Test Short Name"
```

At this stage, only `PARAMCD` is required to perform the derivations. Additional
derived parameters may be added, so only `PARAMCD` is joined to the datasets at
this point. All other variables related to `PARAMCD` (e.g. `PARAM`, `PARCAT1`, ...)
will be added when all `PARAMCD` are derived.

```{r eval=TRUE}
advs <- derive_vars_merged_lookup(
  advs,
  dataset_add = param_lookup,
  new_vars = exprs(PARAMCD),
  by_vars = exprs(VSTESTCD)
)
```

```{r, eval=TRUE, echo=FALSE}
advs_param <- distinct(advs, USUBJID, PARAMCD, VSTESTCD)

dataset_vignette(advs_param, display_vars = exprs(USUBJID, VSTESTCD, PARAMCD))
```

Please note, it may be necessary to include other variables in the join. For
example, perhaps the `PARAMCD` is based on `VSTESTCD` and `VSPOS`, it may be 
necessary to expand this lookup or create a separate look up for `PARAMCD`.

If more than one lookup table, e.g., company parameter mappings and project
parameter mappings, are available, `consolidate_metadata()` can be used to
consolidate these into a single lookup table.

Additionally note that each parameter is mapped to only one `PARCAT1` variable. 
This is described in section 3.3.4.1 of the ADaM Implementation Guide, 
version 1.3: "Any given `PARAM` may be associated with at-most one level 
of `PARCATy` (e.g., one level of `PARCAT1` and one level of `PARCAT2`)". 

## Derive Results (`AVAL`, `AVALC`) {#aval}

The mapping of `AVAL` and `AVALC` is left to the ADaM programmer. An 
example mapping may be:

```{r eval=TRUE}
advs <- mutate(
  advs,
  AVAL = VSSTRESN
)
```

In this example, as is often the case for ADVS, all `AVAL` values are numeric without any corresponding non-redundant text value for `AVALC`.
Per recommendation in ADaMIG v1.3 we do not map `AVALC`.


```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(VSTESTCD, PARAMCD, VSSTRESN, VSSTRESC, AVAL),
  filter = USUBJID == "01-716-1024"
)
```

## Derive Additional Parameters (e.g. `BSA`, `BMI` or `MAP` for `ADVS`) {#derive_param}

Optionally derive new parameters creating `PARAMCD` and `AVAL`. Note that only
variables specified in the `by_vars` argument will be populated in the newly 
created records. This is relevant to the functions `derive_param_map`, 
`derive_param_bsa`, `derive_param_bmi`, and `derive_param_qtc`. 

Below is an example of creating `Mean Arterial Pressure` for `ADVS`, see also 
Example 3 in section below [Derive New Rows](#additional) for alternative way of creating new parameters.

```{r eval=TRUE}
advs <- derive_param_map(
  advs,
  by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
  set_values_to = exprs(PARAMCD = "MAP"),
  get_unit_expr = VSSTRESU,
  filter = VSSTAT != "NOT DONE" | is.na(VSSTAT)
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs, USUBJID, VISITNUM, VSTPTNUM, ADT, PARAMCD),
  display_vars = exprs(VSTESTCD, PARAMCD, VISIT, VSTPT, AVAL),
  filter = USUBJID == "01-701-1015" & PARAMCD %in% c("MAP", "DIABP", "SYSBP")
)
```

Likewise, function call below, to create parameter `Body Surface Area` (BSA) and 
`Body Mass Index` (BMI) for `ADVS` domain.  Note that if height is collected only once use `constant_by_vars` to specify the subject-level variable to merge on.  Otherwise BSA and BMI are only calculated for visits where both are collected.

```{r eval=TRUE}
advs <- derive_param_bsa(
  advs,
  by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
  method = "Mosteller",
  set_values_to = exprs(PARAMCD = "BSA"),
  get_unit_expr = VSSTRESU,
  filter = VSSTAT != "NOT DONE" | is.na(VSSTAT),
  constant_by_vars = exprs(USUBJID)
)

advs <- derive_param_bmi(
  advs,
  by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
  set_values_to = exprs(PARAMCD = "BMI"),
  get_unit_expr = VSSTRESU,
  filter = VSSTAT != "NOT DONE" | is.na(VSSTAT),
  constant_by_vars = exprs(USUBJID)
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs, USUBJID, VISITNUM, VSTPTNUM, ADT, PARAMCD),
  display_vars = exprs(USUBJID, VSTESTCD, PARAMCD, VISIT, VSTPT, AVAL),
  filter = PARAMCD %in% c("BSA", "BMI")
)
```

Similarly, for `ADEG`, the parameters `QTCBF` `QTCBS` and `QTCL` can be
created with a function call. See example below for `PARAMCD` = `QTCF`.

```{r eval=FALSE}
adeg <- tibble::tribble(
  ~USUBJID, ~EGSTRESU, ~PARAMCD, ~AVAL,          ~VISIT,
  "P01",       "msec",     "QT",   350, "CYCLE 1 DAY 1",
  "P01",       "msec",     "QT",   370, "CYCLE 2 DAY 1",
  "P01",       "msec",     "RR",   842, "CYCLE 1 DAY 1",
  "P01",       "msec",     "RR",   710, "CYCLE 2 DAY 1"
)

adeg <- derive_param_qtc(
  adeg,
  by_vars = exprs(USUBJID, VISIT),
  method = "Fridericia",
  set_values_to = exprs(PARAMCD = "QTCFR"),
  get_unit_expr = EGSTRESU
)
```

Similarly, for `ADLB`, the function `derive_param_wbc_abs()` can be used to create new parameter 
for lab differentials converted to absolute values. See example below:

```{r eval=FALSE}
adlb <- tibble::tribble(
  ~USUBJID, ~PARAMCD, ~AVAL,                        ~PARAM,          ~VISIT,
  "P01",       "WBC",    33,    "Leukocyte Count (10^9/L)", "CYCLE 1 DAY 1",
  "P01",       "WBC",    38,    "Leukocyte Count (10^9/L)", "CYCLE 2 DAY 1",
  "P01",     "LYMLE",  0.90, "Lymphocytes (fraction of 1)", "CYCLE 1 DAY 1",
  "P01",     "LYMLE",  0.70, "Lymphocytes (fraction of 1)", "CYCLE 2 DAY 1"
)

derive_param_wbc_abs(
  dataset = adlb,
  by_vars = exprs(USUBJID, VISIT),
  set_values_to = exprs(
    PARAMCD = "LYMPH",
    PARAM = "Lymphocytes Abs (10^9/L)",
    DTYPE = "CALCULATION"
  ),
  get_unit_expr = extract_unit(PARAM),
  wbc_code = "WBC",
  diff_code = "LYMLE",
  diff_type = "fraction"
)
```

When all `PARAMCD` have been derived and added to the dataset, the other information 
from the look-up table (`PARAM`, `PARAMCAT1`,...) should be added.

```{r eval=TRUE}
# Derive PARAM and PARAMN
advs <- derive_vars_merged(
  advs,
  dataset_add = select(param_lookup, -VSTESTCD),
  by_vars = exprs(PARAMCD)
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(VSTESTCD, PARAMCD, PARAM, PARAMN, PARCAT1, PARCAT1N),
  filter = USUBJID == "01-716-1024"
)
```

## Derive Timing Variables (e.g. `APHASE`, `AVISIT`, `APERIOD`) {#timing}

Categorical timing variables are protocol and analysis dependent.  Below is a 
simple example. 

```{r eval=TRUE}
advs <- advs %>%
  mutate(
    AVISIT = case_when(
      str_detect(VISIT, "SCREEN") ~ NA_character_,
      str_detect(VISIT, "UNSCHED") ~ NA_character_,
      str_detect(VISIT, "RETRIEVAL") ~ NA_character_,
      str_detect(VISIT, "AMBUL") ~ NA_character_,
      !is.na(VISIT) ~ str_to_title(VISIT)
    ),
    AVISITN = as.numeric(case_when(
      VISIT == "BASELINE" ~ "0",
      str_detect(VISIT, "WEEK") ~ str_trim(str_replace(VISIT, "WEEK", ""))
    )),
    ATPT = VSTPT,
    ATPTN = VSTPTNUM
  )

count(advs, VISITNUM, VISIT, AVISITN, AVISIT)

count(advs, VSTPTNUM, VSTPT, ATPTN, ATPT)
```

For assigning visits based on time windows and deriving periods, subperiods, and
phase variables see the ["Visit and Period Variables"
vignette](visits_periods.html).

## Timing Flag Variables (e.g. `ONTRTFL`) {#timingflag}

In some analyses, it may be necessary to flag an observation as on-treatment.
The admiral function `derive_var_ontrtfl()` can be used.

For example, if on-treatment is defined as any observation between treatment
start and treatment end, the flag may be derived as:

```{r eval=TRUE}
advs <- derive_var_ontrtfl(
  advs,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, ADT, TRTSDT, TRTEDT, ONTRTFL),
  filter = PARAMCD == "DIABP" & VISIT == "WEEK 2"
)
```

This function returns the original data frame with the column `ONTRTFL` added.
Additionally, this function does have functionality to handle a window on the
`ref_end_date`.  For example, if on-treatment is defined as between treatment 
start and treatment end plus 60 days, the call would be:

```{r include=FALSE}
advs <- select(advs, -ONTRTFL)
```

```{r eval=TRUE}
advs <- derive_var_ontrtfl(
  advs,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT,
  ref_end_window = 60
)
```

In addition, the function does allow you to filter out pre-treatment observations
that occurred on the start date.  For example, if observations with `VSTPT == PRE`
should not be considered on-treatment when the observation date falls between 
the treatment start and end date, the user may specify this using the 
`filter_pre_timepoint` parameter:

```{r include=FALSE}
advs <- select(advs, -ONTRTFL)
```

```{r eval=TRUE}
advs <- derive_var_ontrtfl(
  advs,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT,
  filter_pre_timepoint = ATPT == "AFTER LYING DOWN FOR 5 MINUTES"
)
```

Lastly, the function does allow you to create any on-treatment 
flag based on the analysis needs.  For example, if variable 
`ONTR01FL` is needed, showing the on-treatment flag during Period 01, 
you need to set `new var = ONTR01FL`. In addition, for Period 01 
Start Date and Period 01 End Date, you need `ref_start_date = AP01SDT` 
and `ref_end_date = AP01EDT`.

```{r include=FALSE}
advs_pre <- select(advs, -ONTRTFL)

advs <- tibble::tribble(
  ~USUBJID,         ~ASTDT,          ~AP01SDT,          ~AP01EDT,            ~AENDT,
  "P01", ymd("2020-03-15"), ymd("2020-01-01"), ymd("2020-03-01"), ymd("2020-12-01"),
  "P02", ymd("2019-04-30"), ymd("2020-01-01"), ymd("2020-03-01"), ymd("2020-03-15"),
  "P03", ymd("2019-04-30"), ymd("2020-01-01"), ymd("2020-03-01"),                NA,
)
```

```{r eval=TRUE}
advs <- derive_var_ontrtfl(
  advs,
  new_var = ONTR01FL,
  start_date = ASTDT,
  end_date = AENDT,
  ref_start_date = AP01SDT,
  ref_end_date = AP01EDT,
  span_period = TRUE
)
```
```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, ASTDT, AENDT, AP01SDT, AP01EDT, ONTR01FL)
)
```


## Assign Reference Range Indicator (`ANRIND`) {#referencerange}

The admiral function `derive_var_anrind()` may be used to derive the reference
range indicator `ANRIND`.

This function requires the reference range boundaries to exist on the data frame 
(`ANRLO`, `ANRHI`) and also accommodates the additional boundaries `A1LO` and `A1HI`.

```{r include=FALSE}
range_lookup <- tibble::tribble(
  ~PARAMCD, ~ANRLO, ~ANRHI, ~A1LO, ~A1HI,
  "SYSBP",      90,    130,    70,   140,
  "DIABP",      60,     80,    40,    90,
  "PULSE",      60,    100,    40,   110,
  "TEMP",     36.5,   37.5,    35,    38
)

advs <- derive_vars_merged(
  advs_pre,
  dataset_add = range_lookup,
  by_vars = exprs(PARAMCD)
)
```

The function is called as:

```{r eval=TRUE}
advs <- derive_var_anrind(advs)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, AVAL, ANRLO, ANRHI, A1LO, A1HI, ANRIND),
  filter = PARAMCD == "DIABP" & VISIT == "WEEK 2"
)
```

## Derive Baseline (`BASETYPE`, `ABLFL`, `BASE`, `BNRIND`) {#baseline}

The `BASETYPE` should be derived using the function `derive_basetype_records()`.
The parameter `basetypes` of this function requires a named list of expression 
detailing how the `BASETYPE` should be assigned.  Note, if a record falls into
multiple expressions within the basetypes expression, a row will be produced for 
each `BASETYPE`.


```{r eval=TRUE}
advs <- derive_basetype_records(
  dataset = advs,
  basetypes = exprs(
    "LAST: AFTER LYING DOWN FOR 5 MINUTES" = ATPTN == 815,
    "LAST: AFTER STANDING FOR 1 MINUTE" = ATPTN == 816,
    "LAST: AFTER STANDING FOR 3 MINUTES" = ATPTN == 817,
    "LAST" = is.na(ATPTN)
  )
)

count(advs, ATPT, ATPTN, BASETYPE)
```

It is important to derive `BASETYPE` first so that it can be utilized in 
subsequent derivations. This will be important if the data frame contains
multiple values for `BASETYPE`.

Next, the analysis baseline flag `ABLFL` can be derived using the `{admiral}`
function `derive_var_extreme_flag()`.  For example, if baseline is defined as the last 
non-missing `AVAL` prior or on `TRTSDT`, the function call for `ABLFL` would be:

```{r eval=TRUE}
advs <- restrict_derivation(
  advs,
  derivation = derive_var_extreme_flag,
  args = params(
    by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD),
    order = exprs(ADT, ATPTN, VISITNUM),
    new_var = ABLFL,
    mode = "last"
  ),
  filter = (!is.na(AVAL) & ADT <= TRTSDT & !is.na(BASETYPE))
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, BASETYPE, PARAMCD, ADT, TRTSDT, ATPTN, TRTSDT, ABLFL),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "BASELINE")
)
```

Note: Additional examples of the `derive_var_extreme_flag()` function can be 
found [above.](#analysisrec)

Lastly, the `BASE`, and  `BNRIND` columns can be derived using the `{admiral}` function
`derive_var_base()`. Example calls are:

```{r eval=TRUE}
advs <- derive_var_base(
  advs,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD, BASETYPE),
  source_var = AVAL,
  new_var = BASE
)

advs <- derive_var_base(
  advs,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD, BASETYPE),
  source_var = ANRIND,
  new_var = BNRIND
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, BASETYPE, PARAMCD, ABLFL, BASE, ANRIND, BNRIND),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "BASELINE")
)
```

## Derive Change from Baseline (`CHG`, `PCHG`) {#bchange}

Change and percent change from baseline can be derived using the `{admiral}` 
functions `derive_var_chg()` and `derive_var_pchg()`.  These functions expect `AVAL` 
and `BASE` to exist in the data frame.  The `CHG` is simply `AVAL - BASE` and the
`PCHG` is `(AVAL - BASE) / absolute value (BASE) * 100`.  Examples calls are:

```{r eval=TRUE}
advs <- derive_var_chg(advs)

advs <- derive_var_pchg(advs)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VISIT, BASE, AVAL, CHG, PCHG),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "WEEK 8")
)
```

If the variables should not be derived for all records, e.g., for post-baseline
records only, `restrict_derivation()` can be used.

## Derive Shift (e.g. `SHIFT1`) {#shift}

Shift variables can be derived using the `{admiral}` function `derive_var_shift()`. 
This function derives a character shift variable concatenating shift in values based on a
user-defined pairing, e.g., shift from baseline reference range `BNRIND`
to analysis reference range `ANRIND`. Examples calls are:

```{r eval=TRUE}
advs <- derive_var_shift(advs,
  new_var = SHIFT1,
  from_var = BNRIND,
  to_var = ANRIND
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, SHIFT1, BNRIND, ANRIND, VISIT),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "WEEK 8")
)
```

If the variables should not be derived for all records, e.g., for post-baseline
records only, `restrict_derivation()` can be used.

## Derive Analysis Ratio (`R2BASE`) {#analysisratio}

Analysis ratio variables can be derived using the `{admiral}` function 
`derive_var_analysis_ratio()`. This function derives a ratio variable based on user-specified pair. 
For example, Ratio to Baseline is calculated by `AVAL / BASE` and the function appends a new 
variable `R2BASE` to the dataset. This function can also derive `R2AyHI` and `R2AyLO` values. Examples calls are:

```{r eval=TRUE}
advs <- derive_var_analysis_ratio(advs,
  numer_var = AVAL,
  denom_var = BASE
)

advs <- derive_var_analysis_ratio(advs,
  numer_var = AVAL,
  denom_var = ANRLO,
  new_var = R01ANRLO
)
```


```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, VISIT, BASE, AVAL, ANRLO, R2BASE, R01ANRLO),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "WEEK 8")
)
```

If the variables should not be derived for all records, e.g., for post-baseline
records only, `restrict_derivation()` can be used.

## Derive Analysis Flags (e.g. `ANL01FL`) {#analysisrec}

In most finding ADaMs, an analysis flag is derived to identify the appropriate 
observation(s) to use for a particular analysis when a subject has multiple
observations within a particular timing period.

In this situation, an analysis flag (e.g. `ANLxxFL`) may be used to choose the
appropriate record for analysis.

This flag may be derived using the `{admiral}` function `derive_var_extreme_flag()`. 
For this example, we will assume we would like to choose the latest and 
highest value by `USUBJID`, `PARAMCD`, `AVISIT`, and `ATPT`.

```{r eval=TRUE}
advs <- restrict_derivation(
  advs,
  derivation = derive_var_extreme_flag,
  args = params(
    by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT),
    order = exprs(ADT, ATPTN, AVAL),
    new_var = ANL01FL,
    mode = "last"
  ),
  filter = !is.na(AVISITN)
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, AVISIT, ATPTN, ADT, AVAL, ANL01FL),
  filter = PARAMCD == "DIABP" & VISIT %in% c("WEEK 2", "WEEK 8")
)
```

Another common example would be flagging the worst value for a subject,
parameter, and visit. For this example, we will assume we have 3 `PARAMCD` 
values (`SYSBP`, `DIABP`, and `RESP`).  We will also assume high is worst for `SYSBP` 
and `DIABP` and low is worst for `RESP`.

```{r eval=TRUE}
advs <- slice_derivation(
  advs,
  derivation = derive_var_extreme_flag,
  args = params(
    by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT),
    order = exprs(ADT, ATPTN),
    new_var = WORSTFL,
    mode = "first"
  ),
  derivation_slice(
    filter = PARAMCD %in% c("SYSBP", "DIABP") & (!is.na(AVISIT) & !is.na(AVAL))
  ),
  derivation_slice(
    filter = PARAMCD %in% "PULSE" & (!is.na(AVISIT) & !is.na(AVAL)),
    args = params(mode = "last")
  )
) %>%
  arrange(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT)
```


```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, AVISIT, AVAL, ADT, ATPTN, WORSTFL),
  filter = USUBJID == "01-701-1015" & WORSTFL == "Y"
)
```

## Assign Treatment (`TRTA`, `TRTP`) {#treatment}

`TRTA` and `TRTP` must match at least one value of the character treatment
variables in ADSL (e.g., `TRTxxA`/`TRTxxP`, `TRTSEQA`/`TRTSEQP`,
`TRxxAGy`/`TRxxPGy`).

An example of a simple implementation for a study without periods could be:

```{r eval=TRUE}
advs <- mutate(advs, TRTP = TRT01P, TRTA = TRT01A)

count(advs, TRTP, TRTA, TRT01P, TRT01A)
```

For studies with periods see the ["Visit and Period Variables"
vignette](visits_periods.html#treatment_bds).

## Assign `ASEQ` {#aseq}

The `{admiral}` function `derive_var_obs_number()` can be used to derive `ASEQ`. An 
example call is:

```{r eval=TRUE}
advs <- derive_var_obs_number(
  advs,
  new_var = ASEQ,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(PARAMCD, ADT, AVISITN, VISITNUM, ATPTN),
  check_type = "error"
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, ADT, AVISITN, ATPTN, VISIT, ADT, ASEQ),
  filter = USUBJID == "01-701-1015"
)
```

## Derive Categorization Variables (`AVALCATy`) {#cat}

We can use the `derive_vars_cat()` function to derive the categorization variables.

```{r eval=TRUE}
avalcat_lookup <- exprs(
  ~PARAMCD,  ~condition,   ~AVALCAT1, ~AVALCA1N,
  "HEIGHT",  AVAL > 140,   ">140 cm",         1,
  "HEIGHT", AVAL <= 140, "<= 140 cm",         2
)
advs <- advs %>%
  derive_vars_cat(
    definition = avalcat_lookup,
    by_vars = exprs(PARAMCD)
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, PARAMCD, AVAL, AVALCA1N, AVALCAT1),
  filter = PARAMCD == "HEIGHT"
)
```

## Derive Criterion Variables (`CRITy`, `CRITyFL`, `CRITyFN`) {#crit_vars}

For deriving criterion variables (`CRITy`, `CRITyFL`, `CRITyFN`) `{admiral}`
provides `derive_vars_crit_flag()`. It ensures that they are derived in an
ADaM-compliant way (see documentation of the function for details).

In most cases the criterion depends on the parameter. The higher order functions
`restrict_derivation()` and `slice_derivation()` are useful in this case. In the
following example  the criterion flags for systolic and diastolic blood pressure
from the ADaM IG are derived.

The first criterion is based on `AVAL` and is derived for systolic and diastolic
blood pressure. `slice_derivation()` us used to specify the condition and
description of the criterion depending on the parameter.
```{r eval=TRUE}
advs <- advs %>%
  slice_derivation(
    derivation = derive_vars_crit_flag,
    args = params(
      values_yn = TRUE,
      create_numeric_flag = TRUE
    ),
    derivation_slice(
      filter = PARAMCD == "SYSBP",
      args = params(
        condition = AVAL > 160,
        description = "Systolic Pressure > 160"
      )
    ),
    derivation_slice(
      filter = PARAMCD == "DIABP",
      args = params(
        condition = AVAL > 95,
        description = "Diastolic Pressure > 95"
      )
    )
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs, USUBJID, AVISITN, ATPTN, PARAMCD),
  display_vars = exprs(USUBJID, PARAMCD, AVAL, CHG, CRIT1, CRIT1FL, CRIT1FN),
  filter = PARAMCD %in% c("DIABP", "SYSBP")
)
```

The second criterion is based on `AVAL` and `CHG` and is derived for systolic
blood pressure only. Thus `restrict_derivation()` is used.
```{r eval=TRUE} 
advs <- advs %>%
  restrict_derivation(
    derivation = derive_vars_crit_flag,
    args = params(
      condition = AVAL > 160 & CHG > 10,
      description = "Systolic Pressure > 160 and Change from Baseline in Systolic Pressure > 10",
      crit_nr = 2,
      values_yn = TRUE,
      create_numeric_flag = TRUE
    ),
    filter = PARAMCD == "SYSBP"
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs, USUBJID, AVISITN, ATPTN),
  display_vars = exprs(USUBJID, PARAMCD, AVAL, CHG, CRIT2, CRIT2FL, CRIT2FN),
  filter = PARAMCD == "SYSBP"
)
```

## Add ADSL variables {#adsl_vars}
If needed, the other `ADSL` variables can now be added.
List of ADSL variables already merged held in vector `adsl_vars`

```{r eval=TRUE}
advs <- advs %>%
  derive_vars_merged(
    dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
    by_vars = exprs(STUDYID, USUBJID)
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  advs,
  display_vars = exprs(USUBJID, RFSTDTC, RFENDTC, DTHDTC, DTHFL, AGE, AGEU),
  filter = USUBJID == "01-701-1015"
)
```

## Derive New Rows {#additional}

When deriving new rows for a data frame, it is essential the programmer takes 
time to insert this derivation in the correct location of the code. The location
will vary depending on what previous computations should be retained on the new 
record and what computations must be done with the new records.

### Example 1 (Creating a New Record):

To add a new record based on the selection of a certain criterion (e.g. minimum,
maximum) `derive_extreme_records()` can be used. The new records include all
variables of the selected records.

#### Adding a New Record for the Last Value
For each subject and Vital Signs parameter, add a record holding last valid
observation before end of treatment. Set `AVISIT` to `"End of Treatment"` and
assign a unique `AVISITN` value.

```{r eval=TRUE}
advs_ex1 <- advs %>%
  derive_extreme_records(
    dataset_add = advs,
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    order = exprs(ADT, AVISITN, ATPTN, AVAL),
    mode = "last",
    filter_add = (4 < AVISITN & AVISITN <= 12 & ANL01FL == "Y"),
    set_values_to = exprs(
      AVISIT = "End of Treatment",
      AVISITN = 99,
      DTYPE = "LOV"
    )
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs_ex1, USUBJID, PARAMCD, desc(AVISITN), ATPTN),
  display_vars = exprs(USUBJID, PARAMCD, ADT, AVISITN, AVISIT, ATPTN, AVAL, DTYPE, ANL01FL),
  filter = USUBJID == "01-701-1015" & ANL01FL == "Y"
)
```

#### Adding a New Record for the Minimum Value
For each subject and Vital Signs parameter, add a record holding the minimum
value before end of treatment. If the minimum is attained by multiple
observations the first one is selected. Set `AVISIT` to `"Minimum on Treatment"`
and assign a unique `AVISITN` value.

```{r eval=TRUE}
advs_ex1 <- advs %>%
  derive_extreme_records(
    dataset_add = advs,
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    order = exprs(AVAL, ADT, AVISITN, ATPTN),
    mode = "first",
    filter_add = (4 < AVISITN & AVISITN <= 12 & ANL01FL == "Y" & !is.na(AVAL)),
    set_values_to = exprs(
      AVISIT = "Minimum on Treatment",
      AVISITN = 98,
      DTYPE = "MINIMUM"
    )
  )
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs_ex1, USUBJID, PARAMCD, desc(AVISITN), ATPTN),
  display_vars = exprs(USUBJID, PARAMCD, ADT, AVISITN, AVISIT, ATPTN, AVAL, DTYPE, ANL01FL),
  filter = USUBJID == "01-701-1015" & ANL01FL == "Y"
)
```

### Example 2 (Deriving a Summary Record)

For adding new records based on aggregating records `derive_summary_records()`
can be used. For the new records only the variables specified by `by_vars` and
`set_values_to` are populated.

For each subject, Vital Signs parameter, visit, and date add a record holding
the average value for observations on that date.
Set `DTYPE` to `AVERAGE`.

```{r eval=TRUE}
advs_ex2 <- derive_summary_records(
  advs,
  dataset_add = advs,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD, VISITNUM, ADT),
  set_values_to = exprs(
    AVAL = mean(AVAL, na.rm = TRUE),
    DTYPE = "AVERAGE"
  )
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs_ex2, USUBJID, PARAMCD, VISITNUM, ADT, DTYPE),
  display_vars = exprs(USUBJID, PARAMCD, VISITNUM, ADT, AVAL, DTYPE),
  filter = USUBJID == "01-701-1015"
)
```

### Example 3 (Deriving a New `PARAMCD`)

Use function `derive_param_computed()` to create a new `PARAMCD`. Note that only
variables specified in the `by_vars` argument will be populated in the newly 
created records. 

Below is an example of creating `Mean Arterial Pressure` 
(`PARAMCD = MAP2`) with an alternative formula.

```{r eval=TRUE}
advs_ex3 <- derive_param_computed(
  advs,
  by_vars = exprs(USUBJID, VISIT, ATPT),
  parameters = c("SYSBP", "DIABP"),
  set_values_to = exprs(
    AVAL = (AVAL.SYSBP - AVAL.DIABP) / 3 + AVAL.DIABP,
    PARAMCD = "MAP2",
    PARAM = "Mean Arterial Pressure 2 (mmHg)"
  )
)
```

```{r, eval=TRUE, echo=FALSE}
dataset_vignette(
  arrange(advs_ex3, USUBJID, VISIT, ATPT, PARAMCD),
  display_vars = exprs(USUBJID, PARAMCD, VISIT, ATPT, AVAL),
  filter = USUBJID == "01-701-1015" & PARAMCD %in% c("MAP2", "SYSBP", "DIABP")
)
```

## Add Labels and Attributes {#attributes}

Adding labels and attributes for SAS transport files is supported by the
following packages:

- [metacore](https://atorus-research.github.io/metacore/): establish a common
foundation for the use of metadata within an R session.

- [metatools](https://pharmaverse.github.io/metatools/): enable the use of
metacore objects. Metatools can be used to build datasets or enhance columns in
existing datasets as well as checking datasets against the metadata.

- [xportr](https://atorus-research.github.io/xportr/): functionality to
associate all metadata information to a local R data frame, perform data set
level validation checks and convert into a [transport v5
file(xpt)](https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/movefile/n1xbwdre0giahfn11c99yjkpi2yb.htm).

NOTE: All these packages are in the experimental phase, but the vision is to
have them associated with an End to End pipeline under the umbrella of the
[pharmaverse](https://github.com/pharmaverse). An example of applying metadata 
and perform associated checks can be found at the [pharmaverse E2E example](https://pharmaverse.github.io/examples/adam/adsl).

# Example Scripts {#example}

ADaM | Sourcing Command
---- | --------------
ADEG | `use_ad_template("ADEG")`
ADVS | `use_ad_template("ADVS")`
ADLB | `use_ad_template("ADLB")`