---
title: "The Jacquard package"
author:
- Jan Graffelman - Dpt. of Statistics, Universitat Politecnica de Catalunya; Dpt. of Biostatistics, University of Washington
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: Jacquard.bib
vignette: >
  %\VignetteIndexEntry{The Jacquard package}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---


```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(warnings = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(fig.width = 6, fig.height = 6) 
```

## Introduction

<div style="text-align: justify"> 

This document explains the basic functionality of the **Jacquard** package  which provides functions for the estimation of the nine condensed Jacquard coefficients (@Jacquard) for biallelic genetic marker data using a constrained least squares approach (@Graffelman2024).

Outline:

1. [Installation](#installation)

2. [Estimation of the Jacquard coefficients](#coefficients)

3. [Estimation of derived relatedness parameters](#derived)

4. [Alternative estimation procedures](#mom)

5. [References](#refs)

## 1. Installation<a name="introduction"></a>

The package can be installed from CRAN with the instructions

```{r preinstall}
#install.packages("Jacquard")
library(Jacquard)
```

## 2. Estimation of Jacquard the coefficients<a name="coefficients"></a>

We illustrate the estimation of the Jacquard coefficients using (0,1,2) coded genotype data available in the data set `SimulatedPedigree`, of which we show a small part


```{r data}
data(SimulatedPedigree)
SimulatedPedigree[1:5,1:10]
```

This matrix contains `r formatC(nrow(SimulatedPedigree), digits=0, format = "f")` individuals, spanning seven generations, in its rows and pedigree 
information plus genotype data of 20,000 SNPs in its columns. We first
separate the pedigree information from the genotype information.

```{r}
Xped <- SimulatedPedigree[,1:5]
Xgen <- as.matrix(SimulatedPedigree[,6:ncol(SimulatedPedigree)])
```

We next determine the nine joint genotype counts for all pairs of individuals, storing these counts in a list object with nine lower triangular matrices, using function `JointGenotypeCounts`. For efficiency,
we here load the precalculated joint counts.


```{r}
#GTC <- JointGenotypeCounts(Xgen)
data(GTC)
names(GTC)
```

E.g., the counts of the major homozygote pairs for the first five individuals are  

```{r}
GTC[[1]][1:5,1:5]
```

We create a vector with the minor allele frequency of each SNP.


```{r}
mafvec <- mafvector(Xgen)
mafvec[1:5]
```

We proceed to estimate the Jacquard coefficients by constrained least squares with function `Jacquard.cls`. For the sake of illustration, we take the first three founders of the pedigree, and subset their joint genotype counts 

```{r echo = TRUE}
ii <- 1:3
Xped[ii,]

GTCsubset <- list(length = 9)
for (k in 1:9) {
  GTCsubset[[k]] <- matrix(numeric(3^2), ncol = 3)
  GTCsubset[[k]] <- GTC[[k]][ii,ii]
}
```

We set random initial values for the Jacquard coefficients

```{r}
set.seed(123)
delta.init <- runif(9)
delta.init <- delta.init/sum(delta.init)
```

And estimate the pairwise Jacquard coefficients of the three pairs.

```{r echo = TRUE}
output <- Jacquard.cls(GTCsubset,mafvec=mafvec,
                       eps=1e-06,
                       delta.init=delta.init)
Delta.cls <- output$delta
```

Convergence of the solver can be checked by looking at the field `convergence`, where 0 indicates proper convergence.

```{r}
output$convergence
```

Particular estimates of Jacquard coefficients can be extracted from the `Delta.cls` list object, which is a list of nine matrices. E.g., $\Delta_9$ of the first pair of individuals (1,2) can be extracted by


```{r}
Delta.cls[[9]][1,2]
```

All nine estimates of the Jacquard coefficients can be obtained with function `DeltaPair`

```{r}
DeltaPair(Delta.cls,1,2)
```

A pairwise list of the nine Jacquard coefficients of each pair can be obtained with the function `PairwiseList`.


```{r echo = TRUE}
PairwiseList(Delta.cls)
```

The full set of all pairwise Jacquard coefficients can be calculated
with the instructions below. This result is also available in the precalculated
data object `DeltaSimulatedPedigree`.


```{r}
#DeltaSimulatedPedigree <- Jacquard.cls(GTC,mafvec=mafvec,
#                       eps=1e-06,
#                       delta.init=delta.init)$delta
data(DeltaSimulatedPedigree)
```

Boxplots of the estimated Jacquard coefficients can be obtained with function `BoxplotDelta`. Separate boxplots are shown for the diagonals of $\Delta_1$ and
$\Delta_7$.

```{r}
BoxplotDelta(DeltaSimulatedPedigree)
```

## 3. Estimation of derived relatedness parameters<a name="derived"></a>

The set of five identifiable relatedness parameters dicussed by @Csuros, among them coancestry and inbreeding coefficients, can be calculated with
the function `CalculateTheta`. 


```{r}
Theta <- CalculateTheta(DeltaSimulatedPedigree)
```

E.g., estimates of the kinship coefficients of the first five (founder) individuals are obtained by


```{r}
Theta[[1]][1:5,1:5]
```

Individual inbreeding coefficients can be obtained from ...

```{r}
diag(Theta[[2]])[1:5]
diag(DeltaSimulatedPedigree[[1]])[1:5]
```

Boxplots of identifiable relatedness parameters can be made with `BoxplotTheta`


```{r}
BoxplotTheta(Theta)
```

## 4. Alternative estimation procedures<a name="mom"></a>

There are many estimators for coancestry and inbreeding. Function `CalculateThetaMom` calculates moment estimators for the five identifiable 
relatedness parameters defined by @Csuros.

```{r mom, echo = FALSE}
KS.mom <- CalculateMom(Xgen[1:10,],mafvec,verbose=FALSE)
```

E.g., moment estimators of the kinship coefficients of the first five individuals
are given by

```{r}
KS.mom[[1]][1:5,1:5]
```

</div>

## 5. References<a name="refs"></a>