---
title: "Summarise clinical tables records"
output: 
  html_document:
    pandoc_args: [
      "--number-offset=1,0"
      ]
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{summarise_clinical_tables_records}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we will explore the *OmopSketch* functions designed to provide an overview of the clinical tables within a CDM object (*observation_period*, *visit_occurrence*, *condition_occurrence*, *drug_exposure*, *procedure_occurrence*, *device_exposure*, *measurement*, *observation*, and *death*). Specifically, there are four key functions that facilitate this:

-   `summariseClinicalRecords()` and `tableClinicalRecords()`: Use them to create a summary statistics with key basic information of the clinical table (e.g., number of records, number of concepts mapped, etc.)

-   `summariseRecordCount()` and `plotRecordCount()`: Use them to summarise the number of records within a specific time interval.

## Create a mock cdm

Let's see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

```{r, warning=FALSE}
library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()
```

# Summarise clinical tables

Let's now use `summariseClinicalTables()`from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., **condition_occurrence**).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")

summarisedResult |> print()
```

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument `recordsPerPerson` to indicate which estimates you are interested regarding the number of records per person.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95")
)

summarisedResult |>
  filter(variable_name == "records_per_person") |>
  select(variable_name, estimate_name, estimate_value)
```

You can further specify if you want to include the number of records in observation (`inObservation = TRUE`), the number of concepts mapped (`standardConcept = TRUE`), which types of source vocabulary does the table contain (`sourceVocabulary = TRUE`), which types of domain does the vocabulary have (`domainId = TRUE`) or the concept's type (`typeConcept = TRUE`).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE
)

summarisedResult |>
  select(variable_name, estimate_name, estimate_value) |>
  glimpse()
```

Additionally, you can also stratify the previous results by sex and age groups:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)

summarisedResult |>
  select(variable_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
```

Notice that, by default, the "overall" group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35").

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm,
  c("observation_period", "drug_exposure"),
  recordsPerPerson = c("mean", "sd"),
  inObservation = FALSE,
  standardConcept = FALSE,
  sourceVocabulary = FALSE,
  domainId = FALSE,
  typeConcept = FALSE
)

summarisedResult |>
  select(group_level, variable_name, estimate_name, estimate_value) |>
  glimpse()
```

## Tidy the summarised object

`tableClinicalRecords()` will help you to tidy the previous results and create a gt table.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE
)

summarisedResult |>
  tableClinicalRecords()
```

# Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use `summariseRecordCount()` to count the number of records within each year, and then, we use `plotRecordCount()` to create a ggplot with the trend.

```{r, warning=FALSE}
summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years")

summarisedResult |> print()

summarisedResult |> plotRecordCount()
```

Note that you can adjust the time interval period using the `interval` argument, which can be set to either "years" or "months". See the example below, where it shows the number of records every 18 months:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure", interval = "months") |>
  plotRecordCount()
```

We can further stratify our counts by sex (setting argument `sex = TRUE`) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called *overall* with all the sex groups and all the age groups.

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "<30" = c(0, 29),
    ">=30" = c(30, Inf)
  )
) |>
  plotRecordCount()
```

By default, `plotRecordCount()` does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use [VisOmopResults](https://darwin-eu.github.io/visOmopResults/) package to help us know by which columns we can colour or face by:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  visOmopResults::tidyColumns()
```

Then, we can simply specify this by using the `facet` and `colour` arguments from `plotRecordCount()`

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  plotRecordCount(facet = omop_table ~ age_group, colour = "sex")
```

Finally, disconnect from the cdm

```{r, warning=FALSE}
PatientProfiles::mockDisconnect(cdm = cdm)
```