---
title: "Getting started with cancerR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE,
                      comment  = "#>")
```

# Getting started

This vignette will show you how to use the `cancerR` package to classify cancer 
subtypes using the information available from pathology reports which are typically
coded using the International Classification of Diseases for Oncology (ICD-O) system. 
This information is typically available in cancer registries and can be used to 
classify the type of cancer.

```{r setup}
library(cancerR)

# Make example data

data <- data.frame(
  icd_o3_histology = c("8522", "9490", "9070"),
  # Different formats of site codes commonly found in cancer registries
  icd_o3_site = c("C50.1", "C701", "620"),
  icd_o3_behaviour = c("3", "3", "3")
)

head(data)

```

# Convert cancer site

The `site_convert()` function can be used to extract the correct site (*a.k.a.* 
topography) codes and convert them to a standardized numeric format. It is designed
to handle both character and numeric input and will automatically detect if the codes
are in decimal ("C34.1") or integer ("C341") format and convert them.


```{r}

# Convert site codes
data$site_conv <- site_convert(data$icd_o3_site, validate = FALSE)

head(data)

```

`site_convert()` also has built-in validation to ensure that the site codes have 
the correct numeric values ranging from "C00.0" to "C97.9".
This can be called by specifying the `validate` argument as `TRUE`.

```{r}

# Valid site codes
site_convert("C34.1", validate = TRUE)

# Invalid site codes
site_convert("C99.9", validate = TRUE) # Should return NA and an warning message
site_convert("C99.9", validate = FALSE) # Should return 999

```


# Classify adolescent and young adult cancers

The `aya_class()` function can be used to classify adolescent and young adult 
cancer based on the histology, site, and behaviour codes of the cancer.

The method used for the classification can be specified using one of the `method` 
arguments specified below:

- `"Barr 2020"` (**default**) - Classification based on the AYA classification by [Barr et al](https://doi.org/10.1002/cncr.33041)
- `"SEER 2020"` - [S.E.E.R. 2020 Recode Revision](https://seer.cancer.gov/ayarecode/aya-2020.html)
- `"SEER-WHO v2008"` - [S.E.E.R. WHO 2008](https://seer.cancer.gov/ayarecode/aya-who2008.html)
- `"SEER v2006"` - [S.E.E.R. 2006](https://seer.cancer.gov/ayarecode/ayarecode-orig.html)


Users can also specify the depth of the classification tree using the `depth` 
argument. The depth parameter specifies the maximum depth of the classification 
tree, with 1 being the highest level of classification and most general grouping.

```{r}

# Classify AYA cancers using Barr 2020 classification

# Classify at level 1 (most general)
data$dx_lvl_1 <- aya_class(data$icd_o3_histology, data$icd_o3_site, data$icd_o3_behaviour, depth = 1)

# Add more granular classifications
data$dx_lvl_2 <- aya_class(
  histology = data$icd_o3_histology, 
  site = data$site_conv, 
  behaviour = data$icd_o3_behaviour, 
  depth = 2
)

# Add even more granular classifications (level 3) using SEER 2020 revision classification
data$dx_lvl_3 <- aya_class(
  histology = data$icd_o3_histology, 
  site = site_convert(data$icd_o3_site), # Convert site codes using site_convert()
  behaviour = data$icd_o3_behaviour,
  method = "SEER v2020",
  depth = 3
)

# View created columns
print(data[, c("dx_lvl_1", "dx_lvl_2", "dx_lvl_3")])

```

# Classify childhood cancers

Similarly, the `kid_class()` function can be used to classify childhood cancers.

The method used for the classification can be specified using one of the `method`
arguments specified below:

- `"iccc3"` (**default**) - Classification based on the [International Classification of Childhood Cancer, 3rd ed. (ICCC-3)](https://doi.org/10.1002/cncr.20910)
- `"who-iccc3"` - [ICCC-3 Recode ICD-O-3/WHO 2008](https://seer.cancer.gov/iccc/iccc-who2008.html)
- `"iarc2017"` - [ICCC-3 / IARC2017](https://seer.cancer.gov/iccc/iccc-iarc-2017.html)

```{r}

# Make example data

data_kid <- data.frame(
  histology = c("8522", "9490", "9070"),
  site = c("C50.1", "C701", "620"),
  behaviour = c("3", "3", "3")
)

# Classify childhood cancers using ICCC-3 classification
data_kid$dx_lvl_1 <- kid_class(data_kid$histology, data_kid$site, depth = 1) # ICCC-3
data_kid$dx_lvl_1.seer <- kid_class(data_kid$histology, data_kid$site, method = "who-iccc3", depth = 1) # WHO-SEER recode

# Add SEER grouping column
data_kid$seer_grp <- kid_class(data_kid$histology, data_kid$site, depth = 99)

# View results
head(data_kid)

```