%\VignetteIndexEntry{Using RestoreNet}
%\VignetteEngine{R.rsp::tex}
%\VignetteKeyword{R}
%\VignetteKeyword{package}
%\VignetteKeyword{vignette}
%\VignetteKeyword{LaTeX}

\documentclass[11pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage[graphicx]{realboxes}
\usepackage{amssymb}
\usepackage[]{algorithm2e}
\usepackage[noend]{algpseudocode}
\usepackage{makeidx}
\usepackage[svgnames]{xcolor}
\usepackage[symbol]{footmisc}
\usepackage{float}
\usepackage{multirow}
\usepackage{subcaption}
\usepackage{graphics}
\usepackage{adjustbox}
\usepackage{authblk}
\usepackage{amsthm}
\usepackage{listings}
\usepackage{hyperref}

\lstset{language=R,
    basicstyle=\small\ttfamily,
    stringstyle=\color{DarkGreen},
    otherkeywords={0,1,2,3,4,5,6,7,8,9},
    morekeywords={TRUE,FALSE},
    deletekeywords={data,frame,length,as,character},
    keywordstyle=\color{black},
    commentstyle=\color{blue},
}

\usepackage[toc,page,titletoc]{appendix}
\usepackage[capitalise,nameinlink]{cleveref}

\usepackage{chngcntr}

\newtheorem{Assumption}{Assumption}
\newtheorem{Definition}{Definition}
\newtheorem{Observation}{Observation}
\newtheorem{Theorem}{Theorem}
\newtheorem{Properties}{Properties}
\newtheorem{Question}{Question}
\newtheorem{Lemma}{Lemma}


\newcommand{\bigzero}{\mbox{\Huge\bfseries 0}}
\newcommand{\rvline}{\hspace*{-\arraycolsep}\vline\hspace*{-\arraycolsep}}
\newcommand{\quotes}[1]{``#1''}

\SetKwInput{KwInput}{Input}
\SetKwInput{KwOutput}{Output}

\makeatletter
\newcommand*\bigcdot{\mathpalette\bigcdot@{.7}}
\newcommand*\bigcdot@[2]{\mathbin{\vcenter{\hbox{\scalebox{#2}{$\m@th#1\bullet$}}}}}
\makeatother

\title{A quick introduction to \textsf{RestoreNet}}

\author[]{L.\ Del Core\\ l.del.core@rug.nl}

\date{\today}

\begin{document}

\maketitle
\begin{abstract}
This document reviews the key functionalities of \textsf{RestoreNet} package.
Section 1 shows how to simulate a clonal tracking dataset from a stochastic quasi-reaction network.
In particular, we show how to simulate clone-specific trajectories following a given set of biochemical reactions.
Sections 2 and 3 show how to fit the null (base) model and the random-effects model to a simulated clonal tracking dataset.
Finally in Section 4 we show how to visualize the results at clonal level.
\end{abstract}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% Simulating clonal tracking datasets  %%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Simulating clonal tracking datasets}
\label{sec:sim}

A clonal tracking dataset compatible with \textsf{RestoreNet}'s functions must be formatted as a $3$-dimensional array $\pmb{Y}$ whose $ijk$-entry $y_{ijk}$ is the number of cells of clone $k$ for cell type $j$ collected at time $i$.
The function \texttt{get.sim.tl()} can be used to simulate a trajectory of a single clone given an initial conditions $\pmb{Y}_0$ for the cell counts, and obeying to a particular cell differentiation network defined by a list \texttt{rct.lst} of biochemical reactions. 
Consistently with \cite{Core2022.05.31.494100}, our package considers only three cellular events, such as cell duplication, cell death and cell differentiation 
for a time counting process
\begin{equation}
\pmb{x}_t = (x_{1t},\dots,x_{nt})
\label{countingProcess}
\end{equation}
of a single clone in $n$ distinct cell lineages, whose measurements are indicated with $\pmb{y}_t = (y_{1t},\dots,y_{nt})$.
Following \cite{Core2022.05.31.494100}, we assume that the time counting process $\pmb{x}_t$ for a single clone
in a time interval $(t, t + \Delta t)$ evolves according to
a set of reactions $\{\pmb{V}_{\cdot k}\}_k$ and hazard functions $\{h_k\}_k$ defined as
\begin{equation}
    \pmb{V}_{\cdot k} =  \begin{cases}
        (0\dots1_i\dots0)'  \\
        (0\dots-1_i\dots0)' \\
        (0\dots-1_i\dots 2_j \dots 0)'
        \end{cases}
        \quad
    h_k(\pmb{x}_t, \pmb{\theta}) =  \begin{cases}
        x_{it}\alpha_i & \text{for duplication} \\
        x^2_{it}\delta_i &  \text{for death} \\
        x_{it}\lambda_{ij} &  \text{for differentiation}
        \end{cases}        
\label{eq:netEffectCellDiff}
\end{equation}
which contains a linear growth term with a duplication rate parameter $\alpha_i > 0$, and a linear term to describe cell differentiation from lineage $i$ to lineage $j$ with differentiation rate $\lambda_{ij}  > 0$ for each $i \neq j = 1, \dots, n$. Finally we employ a quadratic term for cell death with a death rate parameter $\delta_i  > 0$, and the vector parameter
\begin{align}
\pmb{\theta} = \left(\cdots \alpha_i \cdots \delta_i \cdots \lambda_{ij} \cdots \right)
\end{align}
is the concatenation of all the dynamics parameters.
Thus, the net-effect matrix and the hazard vector are defined as
\begin{align}
V = 
\begin{bmatrix}
\pmb{V}_{\cdot 1} \cdots \pmb{V}_{\cdot k}
\end{bmatrix};
\qquad
\pmb{h}(\pmb{x}_t, \pmb{\theta}) = 
\left(
h_1(\pmb{x}_t, \pmb{\theta}),
\dots,
h_K(\pmb{x}_t, \pmb{\theta})
\right)'
\label{eq:V-h}
\end{align}


\begin{figure}[t]
\centering
  \includegraphics[width=.2\linewidth]{./figures/simStructure.pdf}
    \caption{Cell differentiation structure of four synthetic cell types \textsf{A}, \textsf{B}, \textsf{C} and \textsf{D}. Duplication, death and differentiation moves are indicated with green, red and grey arrows respectively.}
  \label{fig:simSchema}
\end{figure}

\noindent
The cellular events of duplication, death and differentiation are respectively coded by the package functions with the character labels \texttt{"A->1"}, \texttt{"A->0"}, and \texttt{"A->B"},
where \texttt{A} and \texttt{B} are two distinct cell types.
The following \textsf{R} code chunk shows how to simulate clone-specific trajectories of cells via a $\tau$-leaping simulation algorithm. In particular, as an illustrative example  we focus on a simple cell differentiation network structure of four synthetic cell types \textsf{A}, \textsf{B}, \textsf{C} and \textsf{D}, as illustrated in Figure \ref{fig:simSchema}, and only one clone.

\begin{lstlisting}[language=R]
> rcts <- c("A->1", "B->1", "C->1", "D->1",
            "A->0", "B->0", "C->0", "D->0",
            "A->B", "A->C", "C->D") ## set of reactions
> S <- 100 ## trajectory length
> tau <- 1 ## for tau-leaping algorithm
> theta <- c(.2,.15,.17,.09*5,
             .001 , .007 , .004 , .002 ,
             .13, .15, .08) ## parameters
> names(theta) <- rcts
> Y0 <- c(100,0,0,0) ## initial state names(Y0) <- rownames(V)
> names(Y0) <- head(LETTERS ,4)
> s20 <- 1 ## noise variance
> Y <- get.sim.tl(Yt = Y0,
                theta = theta,
                S = S,
                s2 = s20,
                tau = tau,
                rct.lst = rcts) ## simulation
> head(Y) ## look at the simulated data
          A           B          C          D
0 100.61983  0.06136727  0.7714631  0.3255576
1  82.64798 25.80389091 30.2276346  0.0000000
2  67.38059 44.75329724 52.8111779  4.9761676
3  59.22818 57.88492115 64.9075555 15.2798701
4  49.95502 57.19943051 73.4204752 32.5405621
5  43.79580 56.15629549 73.4675043 57.1191486
\end{lstlisting}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% Fitting the base (null) model  %%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Fitting the base model}
\label{sec:nullFit}
Following \cite{Core2022.05.31.494100}, the base model is defined as \\
\begin{adjustbox}{max width=\textwidth}
\parbox{1.1\linewidth}{
\begin{align}
\underbrace{\left[ \begin{smallmatrix} \Delta \pmb{y}_{t_0}   \\   \vdots \\ \Delta \pmb{y}_{t_{T-1}}  \end{smallmatrix} \right]}_{\Delta \pmb{y}} &= \underbrace{\left[ \begin{smallmatrix}  \pmb{M}_{t_0}   \\   \vdots \\ \pmb{M}_{t_{T-1}}  \end{smallmatrix} \right]}_{\pmb{M}} \pmb{\theta} + \pmb{\varepsilon}; \quad
\pmb{\varepsilon} \sim \mathcal{N} \left(\pmb{0}, \overbrace{\underbrace{\left[ \begin{smallmatrix} \pmb{W}_{t_0}(\pmb{\theta}) &  &   \\  & \ddots & \\ & & W_{t_{T-1}}(\pmb{\theta})\end{smallmatrix} \right]}_{\pmb{W}(\pmb{\theta})} + \sigma^2\pmb{I}_{nT}}^{\pmb{\Sigma}(\pmb{\theta},\sigma^2)} \right)
\label{eq:fullGLM}
\end{align}
}
\end{adjustbox}

where

\begin{adjustbox}{max width=\textwidth}
\parbox{1.4\linewidth}{
\begin{equation}
\begin{aligned}
 \underbrace{\Delta \pmb{y}_t}_{\pmb{y}_{t + 1} - \pmb{y}_{t}} = \overbrace{\pmb{V}\left[ \begin{smallmatrix} \prod_{i=1}^{n} {{y_{it}}\choose{r_{1i}}} &  &   \\  & \ddots & \\ & & \prod_{i=1}^{n} {{y_{it}}\choose{r_{Ki}}} \end{smallmatrix} \right]\Delta t}^{\pmb{M}_t}& \underbrace{\left[ \begin{smallmatrix} \theta_1 \\ \vdots \\ \theta_K \end{smallmatrix} \right]}_{\pmb{\theta}} + \left( \underbrace{\pmb{V}\left[ \begin{smallmatrix} h_1(\pmb{y}_t,\pmb{\theta}) &  &   \\  & \ddots & \\ & & h_1(\pmb{y}_t,\pmb{\theta}) \end{smallmatrix} \right]\pmb{V}'\Delta t}_{\pmb{W}_t(\theta)} + \sigma^2\pmb{I}_n   \right)^{1/2}\Delta \pmb{W}(t) \\
% \Delta \pmb{W}(t) &\sim \mathcal{N}(\pmb{0}, \Delta t\pmb{I}_n)
 \Delta \pmb{W}(t) &\sim \mathcal{N}(\pmb{0}, \pmb{I}_n)
\end{aligned}
\label{eq:ItoDiffLinear}
\end{equation}
}
\end{adjustbox}

\noindent
The package \textsf{RestoreNet} allows to infer the parameters $(\pmb{\theta}, \sigma^2)$ of (\ref{eq:fullGLM}) with a maximum likelihood approach, that is by solving the following constrained optimization problem
\begin{align}
\hat{\pmb{\theta}}_{ML} \leftarrow \underset{\pmb{\theta} \geq \pmb{0}; \sigma^2 \geq 0}{\texttt{argmin}} f(\pmb{\theta}, \sigma^2)
\label{eq:p_MLE}
\end{align}
where the objective function $f$ is the negative log-likelihood of the multivariate normal distribution 
$\mathcal{N}_{nT} \left( \pmb{M}\pmb{\theta}, \pmb{\Sigma}(\pmb{\theta},\sigma^2)\right)$.
Details on the inference procedure can be found in \cite{Core2022.05.31.494100}.
The following \textsf{R} code chunk shows how to accomplish this on a  clonal tracking dataset simulated from the same differentiation network structure of previous section. In this case we simulate the trajectories of three independent clones following different dynamics of clonal dominance, that is we use clone-specific values for the vector parameter $\pmb{\theta}$.

\begin{lstlisting}[language=R]
> rcts <- c("A->1", "B->1", "C->1", "D->1",
            "A->0", "B->0", "C->0", "D->0",
            "A->B", "A->C", "C->D") ## set of reactions
> ctps <- head(LETTERS ,4)
> nC <- 3 ## number of clones
> S <- 100 ## trajectory length
> tau <- 1 ## for tau-leaping algorithm
> u_1 <- c(.2, .15, .17, .09*5,
           .001, .007, .004, .002,
           .13, .15, .08)
> u_2 <- c(.2, .15, .17, .09,
           .001, .007, .004, .002,
           .13, .15, .08)
> u_3 <- c(.2, .15, .17*3, .09,
           .001, .007, .004, .002,
           .13, .15, .08)
> theta_allcls <- cbind(u_1, u_2, u_3) ## clone-specific parameters
> rownames(theta_allcls) <- rcts
> s20 <- 1 ## additional noise
> Y <- array(data = NA,
           dim = c(S + 1, length(ctps), nC),
           dimnames = list(seq(from = 0, to = S*tau, by = tau),
                           ctps,
                           1:nC)) ## empty array to store simulations
> Y0 <- c(100,0,0,0) ## initial state
> names(Y0) <- ctps
> for (cl in 1:nC) { ## loop over clones
>   Y[,,cl] <- get.sim.tl(Yt = Y0,
                        theta = theta_allcls[,cl],
                        S = S,
                        s2 = s20,
                        tau = tau,
                        rct.lst = rcts)
> }
> null.res <- fit.null(Y = Y, rct.lst = rcts) ## null model fitting
> null.res$ fit ## model fitting results
$par
 [1] 6.788801e-02 2.125983e-02 9.192739e-03 2.753155e-03 
	1.000000e-07 2.102263e-03 8.510596e-05 7.137124e-05
 [9] 7.727499e-02 1.147283e-01 3.631258e-02 1.297511e+00

$value
[1] 3419.932

$counts
function gradient 
     673      673 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
> null.res$stats ## model statistics
nPar	cll	mll	cAIC	mAIC	Chi2	p-value 
12.000	-2812.692 -2812.692 5649.384 5651.691 337324.840 0.000 
    
> head(null.res$design$M) ## design matrix
6 x 11 sparse Matrix of class "dgCMatrix"                                                                                                   
1 100.61983 .	.	.	-10124.350	.	.
1	.	0.06136727	.	.	.	-0.003765942
1	.	.	0.7714631	.	.	.
1	.	.	.	0.3255576	.	.
1  82.64798	.	.	.	-6830.688	.

> null.res$design$V ## net-effect matrix
  A->1 B->1 C->1 D->1 A->0 B->0 C->0 D->0 A->B A->C C->D
A    1    0    0    0   -1    0    0    0   -1   -1    0
B    0    1    0    0    0   -1    0    0    2    0    0
C    0    0    1    0    0    0   -1    0    0    2   -1
D    0    0    0    1    0    0    0   -1    0    0    2
\end{lstlisting}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% Fitting the random-effects model  %%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Fitting the random-effects model}
\label{sec:reFit}
Consistently with \cite{Core2022.05.31.494100}, the random-effects model is defined as

\begin{equation}
\begin{gathered}
\Delta \pmb{y} = \underbrace{\begin{bmatrix}
    \pmb{M}_1 &  & \bold{0} \\
    & \ddots & \\
     \bold{0} & & \pmb{M}_J
  \end{bmatrix}}_{\pmb{\mathbb{M}} \in \mathbb{R}^{n \times Jp}}\pmb{u} +  \pmb{\varepsilon} \qquad 
   \pmb{u} \sim \mathcal{N}_{Jp} \left(\underbrace{\mathbf{1}_J \otimes \pmb{\theta}}_{\pmb{\theta}_u}, \pmb{I}_J\otimes 
\underbrace{\begin{bmatrix}
    \tau_1^2 &  & \bold{0} \\
    & \ddots & \\
     \bold{0} & & \tau_p^2
  \end{bmatrix}}_{\pmb{\Delta}_u} \right) \\
  \pmb{\varepsilon} \sim \mathcal{N}(\pmb{0}, \pmb{\Sigma}(\pmb{\theta}, \sigma^2))
\end{gathered}
\label{linRegSimpl2}
\end{equation} %\\

\noindent
where 
%$$M^* = \begin{bmatrix}
%    M_1 &  & \bold{0} \\
%    & \ddots & \\
%     \bold{0} & & M_J
%  \end{bmatrix}  \in R^{N \times Jp}$$
$\pmb{\mathbb{M}}$
is the block-diagonal design matrix for the random effects $\pmb{u}$ centered in $\pmb{\theta}$, and each block $\pmb{M}_j$ is clone-specific.
As in the case of the null model (\ref{eq:fullGLM}), to explain additional noise of the data and to avoid singularity of the stochastic covariance matrix $\pmb{W}(\pmb{\theta})$
we added to its diagonal a small unknown quantity $\sigma^2$ which we infer from the data.
Under this framework (see \cite{Core2022.05.31.494100} for details) the conditional distribution of the random effects $\pmb{u}$ given the data $\Delta \pmb{y}$ has the following explicit formulation
\begin{gather}
\pmb{u}\vert \Delta \pmb{y} \sim \mathcal{N}_{Jp}(E_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}[\pmb{u}], V_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}(\pmb{u}))
\end{gather}
where
\begin{align}
\begin{split}
E_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}[\pmb{u}] &= V_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}(\pmb{u}) \left(\pmb{\mathbb{M}}'\pmb{\Sigma}^{-1}(\pmb{\theta},\sigma^2)\Delta \pmb{y} + \pmb{\Delta}_u^{-1}\pmb{\theta}_u \right)\\
V_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}(\pmb{u}) &= \left(\pmb{\mathbb{M}}' \pmb{\Sigma}^{-1}(\pmb{\theta},\sigma^2)\pmb{\mathbb{M}} + \pmb{\Delta}_u^{-1}\right)^{-1}
\end{split}
\label{eq:condMoments}
\end{align}
provide clone-specific mean and variance of the (random) reaction rates.
The package \textsf{RestoreNet} allows to infer the vector parameter $\pmb{\psi} = (\pmb{\theta}, \sigma^2,\tau_1^2, \dots, \tau_p^2)$, and in turn to get the corresponding conditional first two-order moments $E_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}[\pmb{u}]$ and $V_{\pmb{u}\vert \Delta \pmb{y}; \pmb{\psi}}(\pmb{u})$,
by the means of an efficient tailor-made expectation maximization algorithm where $\Delta \pmb{y}$ and $\pmb{u}$ take the roles of the observed and latent states respectively.
Further details on the inference procedure can be found in \cite{Core2022.05.31.494100}.
The following \textsf{R} code chunk shows how to accomplish this on the simulated clonal tracking dataset of previous section. In this example we use the optimal parameter vector $\hat{\pmb{\theta}}_{0}$ estimated for the null model in the previous section, as initial guess for the corresponding parameters in the random-effects model.
\begin{lstlisting}[language=R]
> re.res <- fit.re(theta_0 = null.res$fit$par,
                     Y = Y,
                     rct.lst = rcts,
                     maxemit = 100) ## random-effects model fitting
> re.res$fit$par ## estimated parameters
 [1] 1.000000e-07 1.843245e-03 1.000000e-07 1.036969e-04 
 	5.255077e-04 1.000000e-07 1.000000e-07 1.000000e-07
 [9] 1.026921e-03 5.080835e-03 1.000000e-07 3.837475e-02 
 	2.862468e-02 7.111302e-02 6.109796e-02 1.000000e-07
[17] 4.675422e-05 1.550055e-05 4.952111e-06 1.416910e-02 
	2.576975e-02 1.106758e-02 1.720079e+00
	
> re.res$fit$VEuy$euy ## conditional expected values of u|y
33 x 1 Matrix of class "dgeMatrix"
              [,1]
 [1,] 0.1693522400
 [2,] 0.1478834088
 [3,] 0.1643743275
 [4,] 0.4553735855
 [5,] 0.0006527738
.
.
.
> re.res$fit$VEuy$vuy ## conditional covariance matrix of u|y
33 x 33 sparse Matrix of class "dsCMatrix"
                                                                                                       
 [1,]  3.552098e-04  2.910616e-05  2.925707e-05	.	.	.
 [2,]  2.910616e-05  2.095979e-04 -3.311544e-07	.	.	.
 [3,]  2.925707e-05 -3.311544e-07  1.478458e-04	.	.	.
	.             .             .                            
	.             .             .                            
	.             .             .                            
\end{lstlisting}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%% Graphical representation of results  %%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Visualizing results}
\label{sec:plots}

The main graphical output of \textsf{RestoreNet} is a clonal piechart. Consistently with \cite{Core2022.05.31.494100}, in this representation each clone $k$ is identified with a pie whose slices are lineage-specific and weighted with $w^{l}_k$, defined as the difference between the conditional expectations of the duplication and death parameters, that is
\begin{align}
w^{l}_k = E_{\pmb{u}\vert \Delta \pmb{y}; \hat{\pmb{\psi}}}[u^k_{\alpha_{l}}] - E_{\pmb{u}\vert \Delta \pmb{y}; \hat{\pmb{\psi}}}[u^k_{\delta_{l}}]
\label{eq:slice-weights}
\end{align}
where $u^k_{\alpha_{l}}$ and $u^k_{\delta_{l}}$ are the random-effects respectively for duplication and death of cell $l$ for clone $k$. The diameter of the $k$-th pie is proportional to the euclidean 2-norm of 
\begin{align}
\pmb{w}_k = (w^{l_1}_k,\dots, w^{l_n}_k)
\label{eq:pie-weights}
\end{align}
where $n$ is the number of cell types.
Therefore, the larger the diameter, the more the corresponding clone is expanding into the lineage associated to the largest slice.
The package \textsf{RestoreNet} includes the function \texttt{get.scatterpie()} which returns a clonal piechart given a fitted random-effects model previously obtained with the function \texttt{fit.re()}. 
The following \textsf{R} code chunk illustrates how to obtain a clonal piechart with few lines of \textsf{R} code.
\begin{lstlisting}[language=R]
> re.res <- fit.re(theta_0 = null.res$fit$par,
                     Y = Y,
                     rct.lst = rcts,
                     maxemit = 100) ## random-effects model fitting
> get.scatterpie(re.res, txt = TRUE) ## get the clonal piechart
\end{lstlisting}
\includegraphics[width=.5\linewidth]{./figures/packagess2_1_scatterpie.pdf}

\bibliography{restorenet} 
\bibliographystyle{ieeetr}

\end{document}