---
title: "Introduction to gtexr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to gtexr}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = (Sys.getenv("RUN_VIGNETTES") != "")
)
```

The [GTEx Portal API V2](https://gtexportal.org/home/apiPage) enables programmatic access to data available from the Genotype-Tissue Expression Portal. The gtexr package wraps this API, providing R functions that correspond to each [API endpoint](https://gtexportal.org/api/v2/redoc):

- R function names mirror those of their corresponding endpoint, converted to lower case with spaces replaced with underscores[^1] e.g. the R function for ["Get Service Info"](https://gtexportal.org/api/v2/redoc#tag/GTEx-Portal-API-Info/operation/get_service_info_api_v2__get) is `get_service_info()`.
- Query parameters are similarly mirrored by function arguments e.g. the arguments for `get_maintenance_message()` (corresponding to the endpoint ["Get Maintenance Message"](https://gtexportal.org/api/v2/redoc#tag/Admin-Endpoints/operation/get_maintenance_message_api_v2_admin_maintenanceMessage_get)) are `page` and `itemsPerPage`. For query parameters that accept an array of values however, the corresponding function argument is pluralised to indicate this e.g. for endpoint ["Get Eqtl Genes"](https://gtexportal.org/api/v2/redoc#tag/Static-Association-Endpoints/operation/get_eqtl_genes_api_v2_association_egene_get) the query parameter 'tissueSiteDetailId' is pluralised to argument name `tissueSiteDetailIds` in `get_eqtl_genes()`.
- Default values for arguments mirror those for the API.
- The documentation for each function includes at least one working example e.g. `?get_eqtl_genes` provides example valid values for the required argument `tissueSiteDetailIds`.
- All functions return a `tibble::tibble` by default. Alternatively, the raw JSON from an API call may be retrieved by setting argument `.return_raw` to `TRUE` e.g. `get_service_info(.return_raw = TRUE)`.

[^1]: With the exception of `get_sample_biobank_data()` and `get_sample_datasets()`, for which 'get_sample' is additionally appended with their respective category titles 'biobank_data' and 'datasets'.

## Shiny app

Users can try out all functions interatively with the ⭐[gtexr shiny app](https://7hocgq-rmgpanw.shinyapps.io/gtexr/)⭐, which pre-populates query parameters with those for the first working example from each function's documentation. To run the app locally:

``` r
shiny::runApp(system.file("app", package = "gtexr"))
```

## Paginated responses

```{r setup, message = FALSE}
library(gtexr)
library(dplyr)
library(purrr)
```

Many API endpoints return only the first 250 available items by default. A warning is raised if the number of available items exceeds the selected maximum page size e.g.

```{r}
get_eqtl_genes("Whole_Blood")
```

For most cases, the simplest solution is to increase the value of `itemsPerPage` e.g. `get_eqtl_genes("Whole_Blood", itemsPerPage = 100000)`. This limit can be set globally by setting the "gtexr.itemsPerPage" option with `options(list(gtexr.itemsPerPage = 100000))`.

Alternatively, multiple pages can be retrieved sequentially e.g.

```{r}
# to retrieve the first 3 pages, with default setting of 250 items per page
1:3 |>
  map(\(page) get_eqtl_genes("Whole_Blood", page = page, .verbose = FALSE) |>
        suppressWarnings()) |>
  bind_rows()
```

Note that paging information is printed to the R console by default. Set argument `.verbose` to `FALSE` to silence these messages, or disable globally with `options(list(gtexr.verbose = FALSE))`.

## Examples

The rest of this vignette outlines some example applications of gtexr.

### Get build 37 coordinates for a variant

```{r}
get_variant(snpId = "rs1410858") |>
  tidyr::separate(
    col = b37VariantId,
    into = c(
      "chromosome",
      "position",
      "reference_allele",
      "alternative_allele",
      "genome_build"
    ),
    sep = "_",
    remove = FALSE
  ) |>
  select(snpId:genome_build)
```

### Convert gene symbol to versioned GENCODE ID

Use `get_gene()` or `get_genes()`

```{r get-genes}
get_genes("CRP") |>
  select(geneSymbol, gencodeId)
```

### Convert rsID to GTEx variant ID

Use `get_variant()`

```{r get-variant}
get_variant(snpId = "rs1410858") |>
  select(snpId, variantId)
```

### For a gene of interest, which tissues have significant *cis*-eQTLs?

Use `get_significant_single_tissue_eqtls()` (note this requires versioned GENCODE IDs)

```{r get-significant-single-tissue-eqtls}
gene_symbol_of_interest <- "CRP"

gene_gencodeId_of_interest <- get_genes(gene_symbol_of_interest) |>
  pull(gencodeId) |>
  suppressMessages()

gene_gencodeId_of_interest |>
  get_significant_single_tissue_eqtls() |>
  distinct(geneSymbol, gencodeId, tissueSiteDetailId)
```

### Get data for non-eQTL variants

Some analyses (e.g. Mendelian randomisation) require data for variants which may or may not be significant eQTLs. Use `calculate_expression_quantitative_trait_loci()` with `purrr::map()` to retrieve data for multiple variants

```{r calculate-eqtls}
variants_of_interest <- c("rs12119111", "rs6605071", "rs1053870")

variants_of_interest |>
  set_names() |>
  map(
    \(x) calculate_expression_quantitative_trait_loci(
      tissueSiteDetailId = "Liver",
      gencodeId = "ENSG00000237973.1",
      variantId = x
    )
  ) |>
  bind_rows(.id = "rsid") |>
  # optionally, reformat output - first extract genomic coordinates and alleles
  tidyr::separate(
    col = "variantId",
    into = c(
      "chromosome",
      "position",
      "reference_allele",
      "alternative_allele",
      "genome_build"
    ),
    sep = "_"
  ) |>
  # ...then ascertain alternative_allele frequency
  mutate(
    alt_allele_count = (2 * homoAltCount) + hetCount,
    total_allele_count = 2 * (homoAltCount + hetCount + homoRefCount),
    alternative_allele_frequency = alt_allele_count / total_allele_count
  ) |>
  select(
    rsid,
    beta = nes,
    se = error,
    pValue,
    minor_allele_frequency = maf,
    alternative_allele_frequency,
    chromosome:genome_build,
    tissueSiteDetailId
  )
```