---
title: "Generate a candidate codelist"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a03_GenerateCandidateCodelist}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

In this example we will create a candidate codelist for osteoarthritis, exploring how different search strategies may impact our final codelist. 
First, let's load the necessary packages and create a cdm reference using mock data.

```{r,  message=FALSE, warning=FALSE}
library(dplyr)
library(CodelistGenerator)

cdm <- mockVocabRef()
```

The mock data has the following hypothetical concepts and relationships:

```{r, echo=FALSE}
knitr::include_graphics("Figures/1.png")
```

## Search for keyword match 
We will start by creating a codelist with keywords match. Let's say that we want to find those codes that contain "Musculoskeletal disorder" in their concept_name:
```{r, echo=FALSE}
knitr::include_graphics("Figures/2.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "Musculoskeletal disorder",
  domains = "Condition", 
  standardConcept = "Standard",
  includeDescendants = FALSE,
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```

Note that we could also identify it based on a partial match or based on all combinations match.
```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "Musculoskeletal",
  domains = "Condition",
  standardConcept = "Standard",
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeDescendants = FALSE,
  includeAncestor = FALSE
)

getCandidateCodes(
  cdm = cdm,
  keywords = "Disorder musculoskeletal",
  domains = "Condition",
  standardConcept = "Standard",
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeDescendants = FALSE,
  includeAncestor = FALSE
)
```

Notice that currently we are only looking for concepts with `domain = "Condition"`. However, we can expand the search to all domains using `domain = NULL`.

## Include non-standard concepts
Now we will include standard and non-standard concepts in our initial search. By setting `standardConcept = c("Non-standard", "Standard")`, we allow the function to return, in the final candidate codelist, both the non-standard and standard codes that have been found.

```{r,echo=FALSE}
knitr::include_graphics("Figures/3.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "Musculoskeletal disorder",
  domains = "Condition",
  standardConcept = c("Non-standard", "Standard"),
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeDescendants = FALSE,
  includeAncestor = FALSE
)
```

## Multiple search terms 
We can also search for multiple keywords simultaneously, capturing all of them with the following search:

```{r,echo=FALSE}
knitr::include_graphics("Figures/4.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = c(
    "Musculoskeletal disorder",
    "arthritis"
  ),
  domains = "Condition",
  standardConcept = c("Standard"),
  includeDescendants = FALSE,
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```

## Add descendants
Now we will include the descendants of an identified code using `includeDescendants` argument
```{r,echo=FALSE}
knitr::include_graphics("Figures/5.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "Musculoskeletal disorder",
  domains = "Condition",
  standardConcept = "Standard",
  includeDescendants = TRUE,
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```

Notice that now, in the column `found_from`, we can see that we have obtain `concept_id=1` from an initial search, and `concept_id_=c(2,3,4,5)` when searching for descendants of concept_id 1.

## With exclusions
We can also exclude specific keywords using the argument `exclude`

```{r, echo=FALSE}
knitr::include_graphics("Figures/6.png")
```


```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "Musculoskeletal disorder",
  domains = "Condition",
  exclude = c("Osteoarthrosis", "knee"),
  standardConcept = "Standard",
  includeDescendants = TRUE,
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```

## Add ancestor
To include the ancestors one level above the identified concepts, we can use the argument `includeAncestor`
```{r, echo=FALSE}
knitr::include_graphics("Figures/7.png")
```

```{r, message=FALSE}
codes <- getCandidateCodes(
  cdm = cdm,
  keywords = "Osteoarthritis of knee",
  includeAncestor = TRUE,
  domains = "Condition",
  standardConcept = "Standard",
  includeDescendants = TRUE,
  searchInSynonyms = FALSE,
  searchNonStandard = FALSE,
)

codes
```

## Search using synonyms
We can also pick up codes based on their synonyms. For example, **Osteoarthrosis** has a synonym of **Arthritis**.
```{r, echo=FALSE}
knitr::include_graphics("Figures/8.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "osteoarthrosis",
  domains = "Condition",
  searchInSynonyms = TRUE,
  standardConcept = "Standard",
  includeDescendants = FALSE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```

Notice that if `includeDescendants = TRUE`, **Arthritis** descendants will also be included:
```{r,echo=FALSE}
knitr::include_graphics("Figures/9.png")
```

```{r, message=FALSE}
getCandidateCodes(
  cdm = cdm,
  keywords = "osteoarthrosis",
  domains = "Condition",
  searchInSynonyms = TRUE,
  standardConcept = "Standard",
  includeDescendants = TRUE,
  searchNonStandard = FALSE,
  includeAncestor = FALSE
)
```


## Search via non-standard
We can also pick up concepts associated with our keyword via non-standard search.
```{r,echo=FALSE}
knitr::include_graphics("Figures/10.png")
```

```{r, message=FALSE}
codes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "Degenerative",
  domains = "Condition",
  standardConcept = "Standard",
  searchNonStandard = TRUE,
  includeDescendants = FALSE,
  searchInSynonyms = FALSE,
  includeAncestor = FALSE
)
codes1
```

Let's take a moment to focus on the `standardConcept` and `searchNonStandard` arguments to clarify the difference between them. `standardConcept` specifies whether we want only standard concepts or also include non-standard concepts in the final candidate codelist. `searchNonStandard` determines whether we want to search for keywords among non-standard concepts.

In the previous example, since we set `standardConcept = "Standard"`, we retrieved the code for **Osteoarthrosis** from the non-standard search. However, we did not obtain the non-standard code **degenerative arthropathy** from the initial search. If we allow non-standard concepts in the final candidate codelist, we would retireve both codes:

```{r,echo=FALSE}
knitr::include_graphics("Figures/11.png")
```

```{r, message=FALSE}
codes2 <- getCandidateCodes(
  cdm = cdm,
  keywords = "Degenerative",
  domains = "Condition",
  standardConcept = c("Non-standard", "Standard"),
  searchNonStandard = FALSE,
  includeDescendants = FALSE,
  searchInSynonyms = FALSE,
  includeAncestor = FALSE
)
codes2
```