---
title: "Frequently Asked Questions"
output: 
  rmarkdown::html_vignette:
    df_print: kable
vignette: >
  %\VignetteIndexEntry{Frequently Asked Questions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Basic example

Most subsequent examples build on this simple rifttable:

```{r basic, message = FALSE}
library(rifttable)
library(dplyr) # for data management, e.g., mutate()
library(tibble) # for constructing a tibble, e.g. via tribble()
data(breastcancer, package = "risks")

design <- tribble(
  ~label,                       ~type,                   ~stratum,
  "**Overall**",                "",                      "",
  "  Deaths/N",                 "outcomes/total",        c("Low", "High"),
  "  Risk",                     "risk",                  c("Low", "High"),
  "  Risk ratio (95% CI)",      "rr",                    c("Low", "High"),
  "  Risk difference (95% CI)", "rd",                    c("Low", "High"),
  "",                           "",                      "",
  "**Low hormone receptor**",   "",                      "",
  "  Deaths/N (Risk)",          "outcomes/total (risk)", "Low",
  "  Risk difference (95% CI)", "rd",                    "Low",
  "**High hormone receptor**",  "",                      "",
  "  Deaths/N (Risk)",          "outcomes/total (risk)", "High",
  "  Risk difference (95% CI)", "rd",                    "High"
) |>
  mutate(
    exposure = "stage",
    outcome = "death",
    effect_modifier = "receptor"
  )

rifttable(
  design = design,
  data = breastcancer
) |>
  rt_gt() # obtain formatted output
```


# Why do I get an error?

R's error messages can be frustrating. When using rifttable, these are the typical sources of errors:

* Clerical errors in variable names and arguments. There is no magic here except double-checking the code.
* Missing data. See below: [How do I handle missing data?](#how-do-i-handle-missing-data).
* Discrepancy between estimator (`type`) and the data. For example, `type = "mean"` will not work on a categorical (factor) variable.
* Models that fail to converge. For example, one may be trying to estimate a risk ratio with 0 outcomes in the reference category, or adjusting for 20 covariates in a Cox model with 5 events overall. (Sometimes such attempts will "just" return warning messages -- it is still worth rethinking the modeling strategy.)

To identify where an error is coming from, start simple. Comment out all but one line of the `design`, putting `#` at the beginning of the line. Start with a line that gives basic descriptive data, such as `type = "total"`, `type = "outcomes"` or `type = "events/time"`, and re-run `rifttable()`. Then add more lines with descriptive estimators, one by one. At the end, add lines that fit models, such as `type = "hr"`.


# What is the `design`?

The `design` that the `rifttable()` function takes as input is simply a dataset that defines how the table should look like when `rifttable()` has processed the `data`.

The `design` can be constructed in many different ways. All lead to the same table:

1. **A dataset (tibble) defined using `tribble()`**

    ```{r}
design1 <- tribble(
  ~label,   ~exposure, ~outcome, ~type,
  "N",      "stage",   "death",  "total",
  "Deaths", "stage",   "death",  "outcomes"
)
design1
rifttable(
  design = design1,
  data = breastcancer
) |>
  rt_gt()
    ```

2. **A dataset (tibble) defined using `tibble()`**

    ```{r}
design2 <- tibble(
  label = c("N", "Deaths"),
  exposure = "stage",
  outcome = "death",
  type = c("total", "outcomes")
)
design2
rifttable(
  design = design2,
  data = breastcancer
) |>
  rt_gt()
    ```

3. **Concatenating tibbles, then editing with `mutate()`**

    ```{r}
design3 <- bind_rows(
  tibble( # row 1
    label = "N",
    type = "total"
  ),
  tibble( # row 2
    label = "Deaths",
    type = "outcomes"
  )
) |>
  mutate( # elements that are the same for all rows
    exposure = "stage",
    outcome = "death"
  )
design3
rifttable(
  design = design3,
  data = breastcancer
) |>
  rt_gt()
    ```


4. **For descriptive tables: Use `table1_design()`**

    ```{r}
design4 <- breastcancer |>
  table1_design(
    death, # the total count will automatically be included
    by = stage
  )
design4
rifttable(
  design = design4,
  data = breastcancer
) |>
  rt_gt()
    ```
    
    See a separate overview about [a descriptive Table 1](table1.html).

5. **External datasets**

    The `design` could even be written out in an external dataset that can be
    loaded with `readr::read_csv()` (for CSV files) or `readxl::read_excel()`
    (for Excel sheets).


# How do I handle missing data?

rifttable tries to make as few assumptions as possible about how the user wants to treat missing data.

* Missing values in **exposure**: By default, missing vales (`NA`) in the exposure are displayed as a separate exposure category for descriptive statistics (e.g., `type = "total"` or `type = "mean"`). They are omitted by comparative estimators (e.g., regression models). To change this behavior, call `rifttable()` with the argument `exposure_levels = "nona"`.
* Missing values in **outcome**: By default, descriptive statistics will be missing (e.g., results will be `--` or `NA`), and regression models will use the non-missing observations. To exclude observations with missing outcome values altogether, add `na_rm = TRUE` to (specific rows of) the `design`.
* Missing values in **confounders**: Applies only to regression models, which typically exclude observations with missing values.
* Missing values in the **effect modifier**: Stratified and joint models are only shown for the specified `stratum` of the `effect_modifier`. To include observations with the missing effect modifier, add `NA` to the requested stratum in the `design`, e.g., `effect_modifier = "bmi", stratum = c("<25", NA)`.


# How do I add overall statistics?

Use the `overall` argument to show descriptive data for the entire `data` set.
Inferential estimators showing comparisons between exposure categories will be blank there.

```{r overall}
rifttable(
  design = design,
  data = breastcancer,
  overall = TRUE
) |>
  rt_gt() # obtain formatted output
```


# How do I test for trend?

Instead of *testing* a null hypothesis about a trend, rifttable proposes 
*estimating* the difference in the outcome for a one-unit higher exposure. 
This is also called a linear slope. Here, we estimate the risk associated for 
stage that is one category higher.

```{r trend}
rifttable(
  design = design |>
    mutate(trend = "stage_numeric"),
  data = breastcancer |>
    mutate(stage_numeric = as.numeric(stage))
) |>
  rt_gt() # obtain formatted output
```


# How do I show multiple exposures in the same table?

Our simple toy dataset just has one exposure variable. For demonstration,
we just create a second variable, with two categories, "Level 1" and "Level 2,"
which is a simplified combination of the `stage` and `receptor` variables.

We will flip the table `layout` from `"rows"` (the default) to `"cols"` and 
concatenate two rifttables. We also need to give our new `exposure2` variable 
the same label as `stage` to make sure results appear in 
the same column.


```{r multiple}
breastcancer_2exposures <- breastcancer |>
  mutate(
    exposure2 = case_when(
      stage == "Stage I" |
        (stage == "Stage II" & receptor == "High") ~
        "Level 1",
      stage == "Stage III" |
        (stage == "Stage II" & receptor == "Low") ~
        "Level 2"
    )
  )

attr(breastcancer_2exposures$exposure2, which = "label") <- "Exposure"
attr(breastcancer_2exposures$stage, which = "label") <- "Exposure"

bind_rows(
  design |>
    mutate(exposure = "exposure2") |>
    slice(2:5) |>
    rifttable(
      data = breastcancer_2exposures,
      layout = "cols"
    ),
  design |>
    slice(2:5) |>
    rifttable(
      data = breastcancer_2exposures,
      layout = "cols"
    )
) |>
  rt_gt() # obtain formatted output
```


# How do I change how results are rounded?

By default, difference measures are being rounded to 2 decimal digits (`0.01`), such as `type = "diff"`, the mean difference, or `type = "quantreg"`, the median difference. The same goes for risk measures, such as `type = "risk"`, unless shown as percentage points. Ratio measures are also shown with 2 decimal digits, such as `type = "hr"`, the hazard ratio, or `type = "fold"`, a ratio of arithmetic means. 

Rounding can be changed by setting the `rifttable()` arguments `diff_digits`, `risk_digits`, and `ratio_digits` globally for the entire table. 

```{r rounding}
design <- tribble(
  ~label,                     ~type,
  "Deaths/N",                 "outcomes/total",
  "Risk",                     "risk",
  "Risk ratio (95% CI)",      "rr",
  "Odds ratio (95% CI)",      "or",
  "Risk difference (95% CI)", "rd"
) |>
  mutate(
    exposure = "stage",
    outcome = "death"
  )

rifttable(
  design = design,
  data = breastcancer,
  ratio_digits = 3, # Many digits for ratios
  risk_digits = 1
) |> # Fewer digits for risks
  rt_gt() # obtain formatted output
```

As can be seen, ratios > 3 are still shown with 1 fewer decimal, and ratios > 10 are shown with 2 fewer decimals ([Wilcox, *Epidemiology* 2004](https://doi.org/10.1097/01.ede.0000101026.08873.14) motivates why). To disable such additional rounding of extremely high ratios:

```{r rounding_decrease}
rifttable(
  design = design,
  data = breastcancer,
  ratio_digits = 3,
  ratio_digits_decrease = NULL, # Do not round high ratios more
  risk_digits = 1
) |>
  rt_gt() # obtain formatted output
```

Additionally, rounding can be changed for each row, adding a column `digits` to the rifttable `design`:

```{r rounding_digits}
tribble(
  ~label,                     ~type,            ~digits,
  "Deaths/N",                 "outcomes/total", NA, # Uses rifttable default
  "Risk",                     "risk",           NA, # Uses risk_digits below
  "Risk ratio (95% CI)",      "",               NA,
  "  Rounded to 1 digit",     "rr",             1,
  "  Rounded to 2 digits",    "rr",             2,
  "Risk difference (95% CI)", "rd",             3
) |> # Overrides risk_digits
  mutate(
    exposure = "stage",
    outcome = "death"
  ) |>
  rifttable(
    data = breastcancer,
    risk_digits = 1
  ) |> # Fewer digits for risks, unless specified by "digits"
  rt_gt() # obtain formatted output
```


# How can I create joint models?

By default, regression models will be fit separately for each stratum of the `effect_modifier`. 

Append `"_joint"` to `"hr"`, `"rr"`, `"rd"`, `"irr"`, `"irrrob"`, `"diff"`, `"fold"`, `"foldlog"`, `"quantreg"`, or `"or"` to obtain "joint" models for exposure and effect modifier that have a single reference category.
 
Note that the joint model will be fit across all non-missing (`NA`) strata of the effect modifier, even if the `design` table does not request all strata be shown.

Compare stratified models to joint models for risk differences (for simplicity of presentation, count data are omitted):

```{r joint}
tribble(
  ~label,                       ~type,      ~stratum,
  "**Overall**",                "rd",       c("Low", "High"),
  "",                           "",         "",
  "**Stratified models**",      "",         "",
  "  Low hormone receptor",     "rd",       "Low",
  "  High hormone receptor",    "rd",       "High",
  "",                           "",         "",
  "**Joint models**",           "",         "",
  "  Low hormone receptor",     "rd_joint", "Low",
  "  High hormone receptor",    "rd_joint", "High"
) |>
  mutate(
    exposure = "stage",
    outcome = "death",
    effect_modifier = "receptor"
  ) |>
  rifttable(data = breastcancer) |>
  rt_gt()
```


# How can I change the reference category?

The reference categories for exposure and effect modifier are always their first factor levels. Compare the preceding example: `"High"` is, alphabetically, before `"Low"`. To change the reference category, use `forcats::fct_relevel()` or the base R alternative `relevel()` on variables in the `data` provided to `rifttable()`: 

```{r joint_reorder}
tribble(
  ~label,                       ~type,      ~stratum,
  "**Joint models**",           "",         "",
  "  Low hormone receptor",     "rd_joint", "Low",
  "  High hormone receptor",    "rd_joint", "High"
) |>
  mutate(
    exposure = "stage",
    outcome = "death",
    effect_modifier = "receptor"
  ) |>
  rifttable(
    data = breastcancer |>
      mutate(
        receptor = relevel(
          factor(receptor), # Make "receptor" a factor in the first place
          ref = "Low"
        )
      )
  ) |> # Set new reference category
  rt_gt()
```

If a middle category of the exposure `stage` is desired as the reference:

```{r reorder_middle}
result_reordered <- tibble(
  label = "**RD (95% CI)**",
  type = "rd",
  exposure = "stage",
  outcome = "death"
) |>
  rifttable(
    data = breastcancer |>
      mutate(
        stage = relevel(
          stage,
          ref = "Stage II"
        )
      )
  )

result_reordered |>
  rt_gt()
```

Using `forcats::fct_relevel()` may be preferable over `relevel()`, as it preserves the variable label: here, the variable `stage` lost its label, `"Stage"` starting with upper case S.

That results for "Stage II" are now listed first is probably undesirable. Reorder the columns of the table that `rifttable()` produced to print results for "Stage I" first:

```{r reorder_middle2}
result_reordered |>
  select(stage, "Stage I", everything()) |>
  rt_gt()
```


# How I do I change the level for confidence intervals?

Add a `ci` column to the `design`:

```{r change_ci}
tribble(
  ~label,            ~type,                   ~ci,
  "Deaths/N (Risk)", "outcomes/total (risk)", NA,
  "Risk ratio",      "",                      NA,
  "  80% CI",        "rr",                    0.8,
  "  95% CI",        "rr",                    NA, # Defaults to 0.95
  "  99% CI",        "rr",                    0.99
) |>
  mutate(
    exposure = "stage",
    outcome = "death"
  ) |>
  rifttable(
    data = breastcancer,
    risk_percent = TRUE
  ) |>
  rt_gt() # obtain formatted output
```


# How do I make rifttable calculate an estimand that is not built-in?

While the package provides a number of estimators commonly used in epidemiology, it will never be able to include all possible estimators. However, any custom estimate can be integrated into a rifttable by a defining custom estimation function.

The subsequent example will reproduce the following basic rifttable, which shows the mean age by sex, stratified by ECOG performance status, in the `cancer` data set:

```{r custom, message = FALSE}
data(cancer, package = "survival")
cancer <- cancer |>
  tibble::as_tibble() |>
  mutate(
    sex = factor(
      sex,
      levels = 1:2,
      labels = c("Male", "Female")
    )
  )

design <- tibble::tibble(
  type = "mean",
  exposure = "sex",
  outcome = "age",
  effect_modifier = "ph.ecog",
  stratum = 1:2,
  label = paste0("ECOG PS ", stratum, ": mean age")
)

design |>
  rifttable(
    data = cancer,
    overall = TRUE
  ) |>
  rt_gt()
```

Instead of relying on rifttable's built-in estimator `type = "mean"`, we will define a custom function that calculates the mean:

```{r custom_fn}
estimate_my_mean <- function(data, ...) {
  data |>
    group_by(.exposure) |>
    summarize(
      res = paste(
        round(
          mean(.outcome),
          digits = 3
        ),
        "yrs"
      )
    )
}
```

Use the custom function `my_mean` instead of the built-in `mean`:

```{r custom_use}
design |> # Edit the previous design
  mutate(
    type = "my_mean", # Replace built-in "mean" by custom "my_mean"
    label = paste0(label, " (custom)")
  ) |>
  rifttable(
    data = cancer,
    overall = TRUE
  ) |>
  rt_gt()
```

Specifications for custom functions:

* The function name must start with `estimate_`; this prefix is to be omitted when later calling the custom function within `rifttable()`.
* The `data` provided to `rifttable()` will be available to the custom function as an argument named `data`.
* The `data` are already subsetted to the `stratum` of the `effect_modifier`, if applicable.
* Copies of key variables are accessible under the names, `.exposure`, `.outcome`, `.event`, `.time`, and `.time2`, as applicable.
* The function must accept all elements of a rifttable `design` (e.g., `confounders`,  `digits`, `na_rm`, etc.) and of the `rifttable()` function (e.g., `reference`, `risk_percent`) as arguments. Many may not be relevant and can be captured in the argument list, `...` (see example).
* The function must return a tibble/data frame with one column for `.exposure`, with one row per exposure category, and one string column `res` for the estimate.