---
title: "LDAcoop"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{LDAcoop}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# What this package is about

 - LDA: limiting dilution assay
 - coop: (cellular) cooperation

The LDA is the gold standard for quantification of the clonogenic capacity of 
non-adherent cells. 
The most common method for calculating this capacity from an LDA experiment 
assumes implicitly independence of all cells. Interestingly, this assumption 
does not hold for many cell lines (i.e. cooperatively and competitively growing 
cells). 

### The effect of cellular cooperation

```{r define caption, echo = FALSE}
mycaption <- "Figure: Fitting LDA data; 
 assuming independence (blue), taking cooperativity into account (grey)"
```

```{r setup,echo=FALSE,fig.width=5, fig.height=3, fig.cap=mycaption}
#install.packages("LDAcoop")
library(LDAcoop)
data(LDAdata)
BT20 <- subset.data.frame(x = LDAdata,
                          subset = (name == "BT.20") & (Group == 8))
BT20 <- BT20[,c("S-value","# Tested","# Clonal growth","Group","replicate")]
BT20 <- BT20[BT20$`S-value`<1000,]
out <- LDA_prepare_plot(LDA_tab = BT20)
  par(
    mar = c(2.5, 3.5, 0.5, 0.5),
    mgp = c(1.5, 0.5, 0)
  )
out[[1]][[1]]$color <- "#545454"
xl <- 750
LDA_plot_activity(LDA_obj = out[[1]],uncertainty.band = T,xlim = c(0,xl))
#statmod::elda(tested = BT20$`# Tested`,
#              dose = BT20$`S-value`,
#              response = BT20$`# Clonal growth`)
#          Lower Estimate    Upper
#Group 1 836.5526 645.4182 497.9539
lines(x = c(0,xl),y = c(0,-xl/645.4182),lwd=2,col="blue")
polygon(x = c(0,xl,xl,1),y = c(0,-xl/836.5526,-xl/497.9539,0),border = F,
        col = rgb(red = 0,green = 0,blue = 1,alpha = 0.5,maxColorValue = 1))
```

\vspace{0.5cm}

Cellular cooperation induces a non-linearity to the dose-response-curve 
(the (log) fraction of non-responding wells over the number of cells seeded)
and thereby biases the estimate of the gold standard analysis method, 
which assumes linearity. 
In other words: when surrounded by many other cells, the combined 
clonogenic activity of 
100 cells is higher than the activity of 100 separated single cells. 
In terms of single cell activity, 100 cells act as if they were more.

Thus, the gold standard analysis method is syntactically applicable 
but its result is biased (i.e. meaningless) for cooperatively growing 
cells.

### Robust analysis in presence (and absence) of cellular cooperation

Therefore, this package equips you with the tools you need to robustly quantify 
the clonogenic capacity of cell lines as well as to quantify the clonogenic 
survival of those cell lines after treatment.


# How to use this package

As known from tools for quantification of the clonogenic capacity from LDA data, 
the input is a data frame (table, matrix) with entries for 

 - cells: the number of cells seeded in a well
 - wells: the number of replicate wells
 - response: the number of wells with positive response (growth)
 - (optional) group: treatment group - e.g. irradiation dose
 
All commands to get tables with numbers or plots of the activity are explained 
below.

## A full experiment

Let's assume, you have conducted an experiment with a set of experimental 
conditions and biological replicates.

Your data will look something like this

```{r showData}
#install.packages("LDAcoop")
library(LDAcoop)
data(LDAdata)
head(LDAdata)
```

A table of clonogenic activities, cooperativity coefficients and survival 
fractions can be generated from such a data table as follows:

```{r data.table}
BT20 <- subset.data.frame(x = LDAdata,
                          subset = name == "BT.20")
BT20 <- BT20[,c("S-value","# Tested","# Clonal growth","Group","replicate")]
round(LDA_table(x = BT20),digits = 3)
```

In this table, we find 

 - `treatment`: the experimental 'Group' 
 - `act`: the calculated activity for the treatment
 - `act.CI.lb` and `act.CI.ub`: its uncertainty (lower and upper bound of the 
 95%-confidence interval)
 - `b`: the cooperativity coefficient (1: no cooperativity)
 - `b.pvalue`: the p-value for test hypothesis of b=1
 - `SF`: the calculated survival fraction (SF)
 - `SF.CI.lb` and `SF.CI.ub`: SF-uncertainty (lb: lower bound, ub: upper bound)
 
A plot of the experiment and the estimated survival fractions can be generated
as follows:

```{r data.plot,fig.width=6, fig.height=4}
LDA_plot(LDA_tab = BT20,uncertainty.band = T)
```

In case the figure looks overcrowded, `uncertainty.band = FALSE` can switch off
the confidence bands and `xlim` (e.g. `xlim = c(0,100)`) can be used, to adjust
the plotted range of seeded cells.

---

# Mathematical background 

## Introduction

Clonogenicity refers to the capacity of single cells to grow into 
new colonies. 
Since not every single cell will successfully form a new colony, 
in a given culture the fraction of cells that do so is often of interest. 
This fraction is then called clonogenic activity.
Mathematically it is a probability $p$.
Often the activity is communicated by the reciprocal, which corresponds to a 
number of cells (e.g. activity $p = 1/42$ written as $a = 1/p = 42$ for 
'one active cell among $a = 42$ non-active cells').

In the LDA, distinct numbers of cells are seeded in wells. The number of 
seeded cells per well is often called dose.
Since, we do not want to get confused with the irradiation dose, which is a 
frequent treatment in the field of radiation oncology, we denote the number of 
cells seeded with $S$. 
The readout of each well in the LDA is dichotomous (growth / no growth).
So, from the binomially distributed number of active cells in a well $X$ with

\[P(X=k) = \left( \begin{matrix} s \\ k \end{matrix}\right) 
p^{k} (1-p)^{S-k} \]

only $X=0$ and $X \neq 0$ can be observed experimentally.
Thus, it is about the frequency of positive wells as a function of the 
number of cells seeded ($k/n = f(S)$).
For dealing with high cell numbers, the binomial distribution is approximated 
by the 
asymptotically identical (for $S \to \infty$) Poisson distribution (with 
$\lambda = S \cdot p$):
\[P_{\lambda}(X=k) = \frac{\lambda^k}{k!} e^{-\lambda}.\]

The basic concept for estimating the clonogenic activity of a cell suspension 
through diluting the number of cells in different wells is the following: 

 - imagine a number of cells in a well, which is so high 
that we expect many active cells among them, then this well will be positive. 
 - imagine a number of cells so low  
that the chance of an active cell among them is very low, then this well
will be negative.
 - thus, between those two numbers there will be cell numbers, where no-one 
 can say, whether a colony will grow or not. It is rather a matter of chance.
 - from Poisson statistics we know, that a frequency of 37% negative wells will 
 be observed, when the average number of active cells per well is one
 ($e^{-1} \approx 0.368$).

So taking the graph 'observed fraction of positive wells over the number 
of cells per well', we find the clonogenic activity as the number of cells 
(x-axis) just where the curve passes the expectation of 37% negative wells.

## Mathematical model

Let $n$ be the number of wells, and let $\mu$ denote the expected fraction of 
negative wells (failures). Following the single-hit Poisson model 
(SHPM: $\lambda_{SHPM} = p \cdot S$), the expectation of active cells in a well 
is approximately Poisson 
distributed and depends linearly on the number of cells seeded $S$ as well as 
the probability $p$ of each cell being active (i.e., clonogenic). 
We therefore find $P_{\lambda}(k=0) = e^{-p S}$), and thus:
\[\mu = e^{-pS}.\]

The number of failures $r$ is binomially distributed

\[ P(Y=r) = \left( \begin{matrix} n \\ r \end{matrix}\right) 
\mu^{r} (1-\mu)^{n-r}. \]

We find (for $\alpha = ln(p)$, $\xi = ln(S)$)

\begin{align*}
\Leftrightarrow \mu &= e^{- p S}\\
\Leftrightarrow ln(\mu) &= - p S \\
\Leftrightarrow -ln(\mu) &= p \cdot S \\ 
\Leftrightarrow ln(-ln(\mu)) &= \alpha + \xi 
\end{align*}

Therefore, technically the identification of the required number of cells is 
done via a generalized linear regression model 
(binomial family, loglog link function).
The clonogenic activity is $p = e^{\alpha}$.

### Cooperativity

In the presented model, $p$ is independent from $S$.
So implicitly, it is assumed, that 100 single cells in single wells have the 
identical chance to result in at least one colony as a single well with those 
100 cells would have. 

But this is often not the case.
Cells export or release substances into the medium and thereby 
communicate and cooperate (or compete).
The combined chance of a group of cells to grow into a clone often exceeds the 
sum of the individual chances. 
(See CFAcoop, for a number of cells seeded $S$ and a number of resulting 
colonies $C$ we find $C = a \cdot S^b$.)

Thus, allowing for the same communication and cooperativity (or competition), 
we replace 
$\lambda = p \cdot S$ with $\lambda = p \cdot S^b$ analogue to CFA.
The modification reads in both cases as 'a number of $S$ cells shows the same 
activity as if they were $S^b$ single cells'.

With this generalization we find 

\begin{align*}
P_{\lambda}(k=0) &= e^{-\lambda}\\
\Leftrightarrow \mu &= e^{- p S^b}\\
\Leftrightarrow ln(\mu) &= - p S^b \\
\Leftrightarrow ln(-ln(\mu)) &= \alpha + \xi \cdot b. 
\end{align*}

Interestingly, this equation is the same as in the special case of 
non-cooperativity, just without the restriction of $b=1$, which corresponds to 
the non-cooperative case. 
The degree of cooperativity is quantified by the coefficient $b$. 

## Activity

When the probability of a cell to grow into a colony is not 
independent from the number of surrounding cells, the earlier definition of 
clonogenic activity (the chance of an isolated single cell) is pointless. 
Thus, the approach to deal with cooperatively growing cells, requires a 
generalized definition for clonogenic activity.

For a fixed outcome of e.g. $\lambda = 1$, which is equivalent with 
$\mu \approx 0.63$, we still may calculate the required number of cells to be 
seeded from the regression coefficients
\[s_{\lambda = 1} = e^{\frac{-\alpha}{b}} = \left(p^{-1}\right)^{\frac{1}{b}}.\] 
On average, one out of these cells will grow into a colony.
Therefore, we define this number $s_{\lambda=1}$ as the 
(generalized) clonogenic activity.

This is analogue to the CFA approach for cellular cooperation 
(which sets the reference at an expectation of $20$ colonies.). 
Please note, the non-cooperative case ($b=1$)
results in the identical activity values as from the previous definition.

### Example

Given the data of an LDA experiment, we find the clonogenic activity where 
expected. The only change in contrast to conventional LDA data analysis is 
the curve, which replaces the linear model.

```{r single.first,fig.width=6, fig.height=3}
cell.line <- unique(LDAdata$name)[2]
AD <- subset.data.frame(LDAdata, subset = (name==cell.line) & 
                         (replicate==1) & (Group == 2))[,4:6]
LDA_plot(LDA_tab = AD,uncertainty = "act", uncertainty.band = T)
LDA_table(x = AD)
full_model_fit <- LDA_activity_single(x = AD)
```

In case, there are biological replicates, the replicate may be indicated by 
the fifth column. Thereby the frequency of responding wells can be plotted 
separately.

```{r set1, fig.width=6, fig.height=3}
AD <- subset.data.frame(LDAdata, subset = (name==cell.line) &
                         (Group == 2))[,c(4:6,3,2)]
LDA_plot(LDA_tab = AD,uncertainty = "act", uncertainty.band = T)
LDA_table(x = AD[,1:3],ref_class = 0)
```

Please note that the starting motivation of a concept 'one active cell among 
how many?' is still the same. It is just restricted to the special outcome of 
37% negative wells. 
The choice of 'outcome of 37% negative wells' is quite artificial and the 
activity for higher or lower active well ratios would change. 
But when it comes to survival fractions, this shift is mostly cut out and 
thereby this approach filters (in parts) a cooperativity-bias in those 
survival fractions.

## Survival fraction

Even though the concept of clonogenic activity is somewhat fuzzy in the presence
of cellular cooperation, the effect of treatments on this clonogenic capacity
can be investigated quite accurately by the same way it is done for the CFA.

The only requirement is a statement on the outcome of interest. At CFA, a 
number of 20 colonies can be agreed upon - and a comparison of the number of
required cells seeded with and without treatment is reasonable.

Analogously, the ratio of cell numbers required to observe the same $\mu$ 
(e.g. $\mu = e^{-1}$) is of interest, when investigating the effect of 
certain treatments. 

\[SF_x = \frac{S_0(\lambda_0 = 1)}{S_x(\lambda_x = 1)}\]
can be calculated from the fitted GLM parameters as 
\[SF_x = exp\left( \frac{log(p_x)}{b_x} - \frac{log(p_0)}{b_0}\right)\]

Therefore, the interpretation of survival fractions is straight forward.
If the number of cells needed to be seeded with treatment is ten times the 
number without treatment, the survival fraction is calculated as $0.1$. 
Note that it is implicitly assumed, that those cells that are non-survivors 
of the treatment, do not contribute to the cooperative stimulation of the 
survivors. Otherwise the treatment effect will be underestimated.


## Uncertainty analysis

Uncertainties for the parameters of the generalized linear model are returned
by the fitting procedure and can be transferred to the clonogenic activity 
via the general `predict.glm` function.

Thus, uncertainties of clonogenic activities are calculated the same way
as the activities are (cutting $y=-1$).

Uncertainties of the survival fractions can be generally assessed by two ways

 - robust approximation by combination of the 84%-confidence-intervals of 
 the activity (method (a), see Austin et al. (2002), Payton et al. (2003), 
 Knol et al. (2011))
 - calculation following the law of error propagation through first order Taylor 
 series expansion (method (b))
 
In cases of extreme cooperativity, method (b) can be numerically unstable and 
therefore we recommend the use of method (a) instead.

Analysis of the calculated uncertainty bounds of the survival fractions 
of the 10 non-extreme cell lines under all treatments 
(see data delivered with this package),
we find a mean ratio (uncertainty-bound method(a) / uncertainty-bound method(b)) 
of 0.9969 and a standard deviation of 0.0049. 
Thus, the deviation of those two methods is very stable in the order of 1%.

### Example

```{r BT20ep}
LDA_table_act <- LDA_table(x = BT20,uncertainty = "act")
cbind(round(LDA_table_act[,1:5],digits = 1),
      round(LDA_table_act[,6:9],digits = 4))
LDA_table_ep <- LDA_table(x = BT20,uncertainty = "ep")
cbind(round(LDA_table_ep[,1:5],digits = 1),
      round(LDA_table_ep[,6:9],digits = 4))
```