---
title: "Evaluating Output"
always_allow_html: yes
output:
  html_document:
    toc: yes
    toc_depth: '3'
    df_print: paged
  html_vignette:
    toc: yes
    toc_depth: 3
    vignette: >
      %\VignetteIndexEntry{Evaluating Output}
      %\VignetteEngine{knitr::rmarkdown}
      %\VignetteEncoding{UTF-8}
  pdf_document:
    toc: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

withr::local_envvar(
  R_USER_CACHE_DIR = tempfile(),
  EUNOMIA_DATA_FOLDER = Sys.getenv("EUNOMIA_DATA_FOLDER", unset = tempfile())
)
```

For this example we take the case of *Viral Sinusitis* and several treatments as events. We set our `minEraDuration = 7`, `minCombinationDuration = 7` and `combinationWindow = 7`. We treat multiple events of *Viral Sinusitis* as separate cases by setting `concatTargets = FALSE`. When set to `TRUE` it would append multiple cases, which might be useful for time invariant target cohorts like chronic conditions.

```{r setup_treatment_patterns, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE), warning=FALSE, error=FALSE}
library(CDMConnector)
library(dplyr)
library(TreatmentPatterns)

cohortSet <- readCohortSet(
  path = system.file(package = "TreatmentPatterns", "exampleCohorts")
)

con <- DBI::dbConnect(
  drv = duckdb::duckdb(),
  dbdir = eunomiaDir()
)

cdm <- cdmFromCon(
  con = con,
  cdmSchema = "main",
  writeSchema = "main"
)

cdm <- generateCohortSet(
  cdm = cdm,
  cohortSet = cohortSet,
  name = "cohort_table",
  overwrite = TRUE
)

cohorts <- cohortSet %>%
  # Remove 'cohort' and 'json' columns
  select(-"cohort", -"json", -"cohort_name_snakecase") %>%
  mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>%
  rename(
    cohortId = "cohort_definition_id",
    cohortName = "cohort_name",
  )

outputEnv <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm,
  minEraDuration = 7,
  combinationWindow = 7,
  minPostCombinationDuration = 7,
  concatTargets = FALSE
)

results <- export(
  andromeda = outputEnv,
  minCellCount = 1,
  nonePaths = TRUE,
  outputPath = tempdir()
)
```

## Saving results
Now that we ran our TreatmentPatterns analysis and have exported our results, we can evaluate the output. The `export()` function in TreatmentPatterns returns an R6 class of `TreatmentPatternsResults`. All results are query-able from this object. Additionally the files are written to the specified `outputPath`. If no `outputPath` is set, only the result object is returned, and no files are written.

If you would like to save the results to csv-, or zip-file after the fact you can still do this. Or upload it to a database:
```{r save, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
# Save to csv-, zip-file
results$saveAsCsv(path = tempdir())
results$saveAsZip(path = tempdir(), name = "tp-results.zip")

# Upload to database
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "sqlite",
  server = file.path(tempdir(), "db.sqlite")
)

results$uploadResultsToDb(
  connectionDetails = connectionDetails,
  schema = "main",
  prefix = "tp_",
  overwrite = TRUE,
  purgeSiteDataBeforeUploading = FALSE
)
```

## Evaluating Results
### treatmentPathways
The treatmentPathways file contains all the pathways found, with a frequency, pairwise stratified by age group, sex and index year.
```{r readTreatmentPathways, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
head(results$treatment_pathways)
```
We can see the pathways contain the treatment names we provided in our event cohorts. Besides that we also see the paths are annoted with a `+` or `-`. The `+` indicates two treatments are a combination therapy, i.e. `amoxicillin+clavulanate` is a combination of _amoxicillin_ and _clavulanate_. The `-` indicates a switch between treatments, i.e. `acetaminophen-penicillinv` is a switch from _acetaminophen_ to _penicillin v_. Note that these combinations and switches can occur in the same pathway, i.e. `acetaminophen-amoxicillin+clavulanate`. The first treatment is _acetaminophen_ that *switches* to a combination of _amoxicillin_ and _clavulanate_.

### countsAge, countsSex, and countsYear
The countsAge, countsSex, and countsYear contain counts per age, sex, and index year.
```{r counts, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
head(results$counts_age)
head(results$counts_sex)
head(results$counts_year)
```

### summaryStatsTherapyDuration
The summaryEventDuration contains summary statistics from different events, across all found "lines". A "line" is equal to the level in the Sunburst or Sankey diagrams. The summary statistics allow for plotting of boxplots with the `plotEventDuration()` function.
```{r summaryStatsTherapyDuration, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration()
```

Not that besides our events there are two extra rows: *mono-event*, and *combination-event*. These are both types of events on average.

We see that most events last between 0 and 100 days. We can see that for *combination-events* and *amoxicillin+clavulanate* there is a tendency for events to last longer than that. *amoxicillin+clavulanate* most likely skews the duration in the *combination-events* group.

We can alter the x-axis to get a clearer view of the durations of the events:
```{r, warning=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration() +
  ggplot2::xlim(0, 100)
```
Now we can more clearly investigate particular treatments. We can see that *penicilin v* tends to last quite short across all treatment lines, while *aspirin* and *acetaminophen* seem to skew to a longer duration.

Additionally we can also set a `minCellCount` for the individual events. 
```{r, warning=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration(minCellCount = 10) +
  ggplot2::xlim(0, 100)
```

### metadata
The metadata file is a file that contains information about the circumstances the analysis was performed in, and information about R, and the CDM.
```{r metadata, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$metadata
```

### Sunburst Plot & Sankey Diagram
From the filtered treatmentPathways file we are able to create a sunburst plot.

The inner most layer is the first event that occurs, going outwards. This aligns with the event duration plot we looked at earlier.
```{r sunburstPlot, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotSunburst()
```

We can also create a Sankey Diagram, which in theory displays the same data. Additionally you see the *Stopped* node in the Sankey diagram. This indicates the end of the pathway. It is mostly a practical addition so that single layer Sankey diagrams can still be plotted.
```{r sankeyDiagram, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotSankey()
```

```{r cleanup, include=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
# Close Andromeda objects
Andromeda::close(outputEnv)

# Close connection to CDM Reference
DBI::dbDisconnect(conn = con)
rm(defaultSettings, minEra60, splitAcuteTherapy, includeEndDate, con, cdm)
```