---
title: "Cohort diagnostics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a03_CohortDiagnostics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", message = FALSE, warning = FALSE,
  fig.width = 7
)

library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir())
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
if (!eunomiaIsAvailable()) downloadEunomiaData(datasetName = "synpuf-1k")
```

## Introduction
In this example we're going to summarise cohort diagnostics results for cohorts of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic data. 

Again, we'll begin by creating our study cohorts.

```{r}
library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PatientProfiles)
library(CohortCharacteristics)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

cdm$injuries <- conceptCohort(cdm = cdm,
  conceptSet = list(
    "ankle_sprain" = 81151,
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399
  ),
  name = "injuries")
```

## Cohort diagnostics

We can run cohort diagnostics analyses for each of our overall cohorts like so:
```{r}
cohort_diag <- cohortDiagnostics(cdm$injuries)
```

Our results will include a summary of the overlap between our cohorts. We could visualise this 
```{r}
plotCohortOverlap(cohort_diag, uniqueCombinations = TRUE)
```

Moreover, our results will also include a summary of the characteristics of each cohort, stratified by age group and sex.
```{r}
tableCharacteristics(cohort_diag, groupColumn = c("age_group", "sex"))
```

You can also visualise the age distribution:
```{r}
tableCharacteristics(cohort_diag, groupColumn = c("age_group", "sex"))
```