---
title: "Getting started with CEAS"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with CEAS}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: refs.bib
link-citations: yes
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Setup

```{r setup}
library(ceas)
```

## Importing Seahorse rates data

The read_data function imports a list of Wave excel export file(s), meaning it
will import all `.xlsx` files from a common directory. With that, the user
should organize the file(s) they wish to analyze in a single directory (folder).
This directory can also contain the normalization CSV file (see
[Normalization](#normalization)). An easy way to get such a list is to put all
your data in a directory and list its contents. For example, if your data is in
a directory called "seahorse_data":

```{r data_custom_dir, eval=FALSE}
rep_list <- list.files("seahorse_data", pattern = "*.xlsx", full.names = TRUE)
```

In this vignette we use the package's internal datasets:

```{r data}
rep_list <- system.file("extdata", package = "ceas") |>
  list.files(pattern = "*.xlsx", full.names = TRUE)
raw_data <- readxl::read_excel(rep_list[1], sheet = 2)
knitr::kable(head(raw_data))
```

The data requires the following columns: `r colnames(raw_data)`. The `Group`
column needs to be in the format `biological_group<delimiter>Assay_type` (with
`<space>` as the default delimiter) as shown above. Upon reading with
`read_data`, the `Group` column is split into two at the delimiter character and
used to populate the `group` and `assay` columns. This output format can be set
in the Seahorse machine before starting the experiment. The user can choose to
either

- name their wells according to the *ceas* format on the Wave software or

- manually rename each Group in the Wave excel output before importing via
`read_data.` If you already have the data, this column will have to be converted
to this format to work with *ceas*.

```{r read_dataformat}
seahorse_rates <- read_data(rep_list)
knitr::kable(head(seahorse_rates))
```

### Normalization

There are two types of normalization involved in Seahorse data analysis. The
first type of normalization is background normalization, which is typically
performed within in the Seahorse Wave software. If the Seahorse data has been
background normalized, the “Background” wells should have OCR and ECAR values of
0. *ceas* will flag the user with a warning if the “Background” OCR and ECAR
   values are not 0 (see first row of the table above).

The second type of normalization is sample normalization, which can be performed
using *ceas*. Example sample normalization measures include cell count per well
and $\mu$g of protein per well. Sample normalization using *ceas* requires an
additional CSV file containing two columns:

1. `"exp_group"` or `"well"`

2. experimental measure values (e.g. cell counts) in this format:

```{r, norm_csv}
norm_csv <- system.file("extdata", package = "ceas") |>
  list.files(pattern = "norm.csv", full.names = TRUE)
norm_csv
exp_group_norm <- norm_csv[1]
well_norm <- norm_csv[2]

read.csv(exp_group_norm) |>
    knitr::kable(caption = "For normalizing by experimental group")
read.csv(well_norm) |> head() |>
    knitr::kable(caption = "For normalizing by well")
```

For sample normalization *ceas* can use one of two normalizing methods according
to the provided `norm_method` argument:

- `"self"`: for each experimental group or well, the rows of the Seahorse data
are divided by the corresponding `measure` value. Each OCR, ECAR, and PER value
is divided by the measure it"self". OCR and ECAR values are divided by the
corresponding raw value in the "measure" column. This can be thought of as an
intra-well/experimental group normalization. Each normalized value is then
interpreted as pmol/min per cell or pmol/min per \eqn{\mu}g of protein.

- `"minimum"`: When set to `"minimum"`, each OCR, ECAR, and PER value is
normalized by the minimum value in the `norm_csv` "measure" column. In this
method, every "measure" column's value in the provided CSV file is divided by
the lowest of the "measure" values to get a normalization factor for each well
or experimental group. The OCR, ECAR, and PER values in each well or
experimental group are divided by their corresponding normalization factors.
Compared to `"self"`, this can be thought of as an inter-well/experimental group
normalization based on the lowest `"measure"`. The results may be interpreted as
pmol/min per minimum of the group cell count or \eqn{\mu}g of protein.

Your normalization CSV file path may be passed into `read_data()` using the
`norm` argument along with `norm_column` with either `"exp_group"` or `"well"`
and `norm_method` as either `"self"` or `"minimum"`.

**Note**: it is important to minimize sample variability during your Seahorse
experiment.

```{r normalized_read}
read_data(
    rep_list,
    norm = exp_group_norm,
    norm_column = "exp_group",
    norm_method = "self"
) |> head() |> knitr::kable()
```

## Calculating energetics

### Partitioning data

**Note:**  
When we use the term 'max' in the package documentation we mean the maximal
experimental OCR and ECAR values rather than absolute biological maximums.

The energetics calculation workflow involves partitioning the data into its time
point and assay intervals.

```{r partition_data}
partitioned_data <- partition_data(seahorse_rates)
```

#### Alternative data formats {.tabset}

The default `partition_data()` parameters are set to analyze (1) Mito Stress
Test and (2) Glycolysis Stress Test assays run in parallel in the same
experiment. The `assay_types` list parameter can be modified to account for
alternative experiments (e.g. just a Mito Stress Test assay).

##### Mito + Glyco (our default)

```{r, eval = FALSE}
partitioned_data <- partition_data(
  seahorse_rates,
  assay_types = list(
    basal = "MITO",
    uncoupled = "MITO",
    maxresp = "MITO",
    nonmito = "MITO",
    no_glucose_glyc = "GLYCO",
    glucose_glyc = "GLYCO",
    max_glyc = "GLYCO"
  ),
  basal_tp = 3,
  uncoupled_tp = 6,
  maxresp_tp = 8,
  nonmito_tp = 12,
  no_glucose_glyc_tp = 3,
  glucose_glyc_tp = 6,
  max_glyc_tp = 8
)
```

##### Data in the form of @mookerjee2017

```{r, eval = FALSE}
partitioned_data <- partition_data(
  seahorse_rates,
  assay_types = list(
    basal = "RefAssay",
    uncoupled = "RefAssay",
    maxresp = NA,
    nonmito = "RefAssay",
    no_glucose_glyc = "RefAssay",
    glucose_glyc = "RefAssay",
    max_glyc = NA
  ),
  basal_tp = 5,
  uncoupled_tp = 10,
  nonmito_tp = 12,
  maxresp = NA,
  no_glucose_glyc_tp = 1,
  glucose_glyc_tp = 5,
  max_glyc = NA
)

```

##### Just Mito

```{r, eval = FALSE}
partitioned_data <- partition_data(
  seahorse_rates,
  assay_types = list(
    basal = "MITO",
    uncoupled = "MITO",
    maxresp = "MITO",
    nonmito = "MITO",
    no_glucose_glyc = NA,
    glucose_glyc = "MITO",
    max_glyc = NA
  ),
  basal_tp = 3,
  uncoupled_tp = 6,
  maxresp_tp = 8,
  nonmito_tp = 12,
  no_glucose_glyc_tp = NA,
  glucose_glyc_tp = 3,
  max_glyc_tp = NA
)
```

##### Respiratory control ratio (RCR) and glycolytic capacity (GC) assay

```{r, eval = FALSE}
partitioned_data <- partition_data(
  seahorse_rates,
  assay_types = list(
    basal = "RCR",
    uncoupled = "RCR",
    maxresp = "RCR,"
    nonmito = "RCR",
    no_glucose_glyc = NA,
    glucose_glyc = "GC",
    max_glyc = "GC"
  ),
  basal_tp = 3,
  uncoupled_tp = 6,
  maxresp_tp = 8,
  nonmito_tp = 12,
  no_glucose_glyc = NA,
  glucose_glyc_tp = 3,
  max_glyc_tp = 9
)
```

####

Note that the time point parameters (`maxresp_tp` and `no_glucose_glyc_tp`) also
need to be changed accordingly.

The `get_energetics` function requires pH, pK$_a$ and buffer values.

```{r get_energetics}
energetics <- get_energetics(partitioned_data, ph = 7.4, pka = 6.093, buffer = 0.10)
```

For more information on the calculations see the article on [ATP
calculations](atp_calculation.html).

## Plotting

### Bioenergetic scope plot {.tabset}

The `bioscope_plot` function plots a 2D representation of group “bioenergetic
scope.” Bioenergetic scope describes the theoretical energetic space in which a
matrix operates. The bioenergetic scope coordinates are JATP from OXPHOS on the
y-axis and JATP from glycolysis on the x-axis. The points represent mean basal
and/or max JATP from OXPHOS and glycolysis and the vertical and horizontal lines
represent the standard deviation or confidence interval of JATP from OXPHOS or
glycolysis, respectively.

#### Replicates combined

```{r bioscope_plot, fig.cap="Bioenergetic scope with replicates combined", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
bioscope <- bioscope_plot(
  energetics,
  model = "ols",
  sep_reps = FALSE
)
bioscope
```

#### Replicates as random effects

```{r bioscope_plot_lme, fig.cap="Bioenergetic scope based on a mixed-effects model with replicates as random effect", message = FALSE, out.width = "100%", fig.dim = c(5, 3), dpi = 120}
bioscope_plot(energetics, sep_reps = FALSE, model = "mixed")
```

#### Replicates separated

```{r bioscope_plot_sep_reps, fig.cap="Bioenergetic scope with replicates separated", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
bioscope_plot(energetics, sep_reps = TRUE, model = "ols")
```

### Rate plots

The `rate_plot` function provides an overview of OCR or ECAR for each assay type
over time, which enables cross-group energetic comparisons before and after the
addition of energetic-modulating compounds. The `rate_plot` line represents mean
group OCR or ECAR over the sequential measurements (x-axis) and the shaded
variance region represents standard deviation or specified confidence interval.

#### Oxygen consumption rate (OCR) {.tabset}

##### Replicates combined

```{r ocr, fig.cap="OCR with replicates combined", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
ocr <- rate_plot(
  seahorse_rates,
  measure = "OCR",
  assay = "MITO",
  model = "ols",
  sep_reps = FALSE
)
ocr
```

##### Replicates as random effects

```{r ocr_lme, fig.cap="OCR based on mixed-effects model", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
rate_plot(
  seahorse_rates,
  measure = "OCR",
  assay = "MITO",
  model = "mixed",
  sep_reps = FALSE
)
```

##### Replicates separated

```{r ocr_sep_reps, fig.cap="OCR with replicates separated", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
rate_plot(
  seahorse_rates,
  measure = "OCR",
  assay = "MITO",
  model = "ols",
  sep_reps = TRUE,
  linewidth = 1
)
```


#### Extracellular Acidification Rate (ECAR) {.tabset}

##### Replicates combined

```{r ecar, fig.cap="ECAR with replicates combined", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
ecar <- rate_plot(
  seahorse_rates,
  measure = "ECAR",
  assay = "GLYCO",
  model = "ols",
  sep_reps = FALSE
)
ecar
```

##### Replicates as random effects

```{r ecar_lme, fig.cap="ECAR based on mixed-effects model", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
rate_plot(
  seahorse_rates,
  measure = "ECAR",
  assay = "GLYCO",
  model = "mixed",
  sep_reps = FALSE
)
```

##### Replicates separated

```{r ecar_sep, fig.cap="ECAR with replicates separated", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
rate_plot(
  seahorse_rates,
  measure = "ECAR",
  assay = "GLYCO",
  model = "ols",
  sep_reps = TRUE,
  linewidth = 1

)
```

### ATP plots {.tabset}

The `atp_plot` function plots group JATP values, which enables cross-group
OXPHOS and glycolytic JATP comparisons at basal and max conditions. The
`atp_plot` symbols represent the mean basal or max JATP from OXPHOS or
glycolysis, and the crossbar boundaries represent the standard deviation or
confidence interval JATP variance.

#### Basal glycolysis

```{r basal_glyc, fig.cap="JATP from basal glycolysis with replicates combined", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
basal_glyc <- atp_plot(
  energetics,
  basal_vs_max = "basal",
  glyc_vs_resp = "glyc",
  sep_reps = FALSE
)
basal_glyc 
```

#### Basal respiration

```{r basal_resp, fig.cap="JATP from basal respiration with replicates separated", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
atp_plot(
  energetics,
  basal_vs_max = "basal",
  glyc_vs_resp = "resp",
  model = "ols",
  sep_reps = TRUE
)
```


#### Maximal glycolysis

```{r max_glyc, fig.cap="JATP from maximal glycolysis with a mixed-effects model", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
atp_plot(
  energetics,
  basal_vs_max = "max",
  glyc_vs_resp = "glyc",
  model = "mixed",
  sep_reps = FALSE
)
```


#### Maximal respiration

```{r max_resp, fig.cap="JATP from maximal respiration replicates combined", out.width = "100%", fig.dim = c(5, 3), dpi = 120}
atp_plot(
  energetics,
  basal_vs_max = "max",
  glyc_vs_resp = "resp",
  model = "ols",
  sep_reps = TRUE
)
```

### Customizing plots

CEAS is designed to work with existing `ggplot2` customization functionality and
doesn't include more than shape and size options for its plots.

For example, to change the colors used in the plot, simply make the plot and
add the custom colors you'd like:

#### Colors

```{r custom_colors, out.width = "100%", fig.dim = c(5, 3), dpi = 120}
custom_colors <- c("#e36500", "#b52356", "#3cb62d", "#328fe1")
```

```{r, , out.width = "100%", fig.dim = c(5, 3), dpi = 120}
bioscope +
ggplot2::scale_color_manual(
  values = custom_colors
)
```

```{r, out.width = "100%", fig.dim = c(5, 3), dpi = 120}
ocr +
ggplot2::scale_color_manual(
  values = custom_colors
)
```

#### Labels {.tabset}

##### Change axis labels

```{r, out.width = "100%", fig.dim = c(5, 3), dpi = 120}
ecar +
    ggplot2::labs(x = "Time points")
```

##### Change label size

```{r, out.width = "100%", fig.dim = c(5, 3), dpi = 120}
basal_glyc +
    ggplot2::theme(axis.text = ggplot2::element_text(size = 20))
```

#### Editing functions

We are working on making the plots as customizable as possible. However, if
there are options that cannot be set in the calls to the plotting functions or
with `ggplot2` functions, you can get the code used to make the plots by running
the function name without parenthesis and modify it. Further, since every step
in the ceas workflow provides a dataset, you can run the modified function or
your own custom plotting functions on those datasets.

```{r, eval = FALSE}
rate_plot
```


```{r, results = 'asis', echo = FALSE}
func_code <- capture.output(dput(rate_plot))
cat("```r\n")
cat(func_code, sep = "\n")
cat("\n```")
```

In RStudio, you can run `utils::edit` to modify a function.

```{r, eval = FALSE}
edit(rate_plot)
```



## References