---
title: "Generating realistic data with known truth using the `jointseg` package"
author: "M. Pierre-Jean, G. Rigaill, P. Neuvial"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Generating realistic data}
---

This vignette describes how to use the `jointseg` package to partition bivariate DNA copy number signals from SNP array data into segments of constant parent-specific copy number.  We demonstrate the use of the `PSSeg` function of this package for applying two different strategies.  Both strategies consist in first identifying a list of candidate change points through a fast (greedy) segmentation method, and then to prune this list is using dynamic programming [1].  The segmentation method presented here is Recursive Binary Segmentation (RBS, [2]). We refer to [3] for a more comprehensive performance assessment of this method and other segmentation methods.
\paragraph{keywords:} segmentation, change point model, binary segmentation, dynamic programming, DNA copy number, parent-specific copy number.

Please see Appendix \ref{citation} for citing `jointseg`.

```{r, include=FALSE}
library("jointseg")
```

HERE

```{r, include=FALSE}
library("knitr")
opts_chunk$set(dev='png', fig.width=5, fig.height=5)
opts_knit$set(eval.after = "fig.cap")
```

This vignette illustrates how the `jointseg` package may be used to generate a variety of copy-number profiles from the same biological ``truth''.  Such profiles have been used to compare the performance of segmentation methods in [3].

## Citing `jointseg`

```{r}
citation("jointseg")
```

## Setup

```{r, include=FALSE}
library("jointseg")
```

The parameters are defined as follows:
```{r}
n <- 1e4                                 ## signal length
bkp <- c(2334, 6121)                     ## breakpoint positions
regions <- c("(1,1)", "(1,2)", "(0,2)")  ## copy number regions
```

```{r}
ylims <- cbind(c(0, 5), c(-0.1, 1.1))
colG <- rep("#88888855", n)
hetCol <- "#00000088"
```

For convenience we define a custom plot function for this vignette:
```{r}
plotFUN <- function(dataSet, tumorFraction) {
    regDat <- acnr::loadCnRegionData(dataSet=dataSet, tumorFraction=tumorFraction)
    sim <- getCopyNumberDataByResampling(n, bkp=bkp,
                                         regions=regions, regData=regDat)
    dat <- sim$profile
    wHet <- which(dat$genotype==1/2)
    colGG <- colG
    colGG[wHet] <- hetCol
    plotSeg(dat, sim$bkp, col=colGG)
}
```


## Affymetrix data

```{r}
ds <- "GSE29172"
```

```{r, fig.cap=paste("Data set", ds, ":", pct, "% tumor cells")}
pct <- 1
plotFUN(ds, pct)
```

```{r, fig.cap=paste("Data set", ds, ":", pct, "% tumor cells (another resampling)")}
plotFUN(ds, pct)
```

```{r, fig.cap=paste("Data set", ds, ":", pct, "% tumor cells")}
pct <- 0.7
plotFUN(ds, pct)
```

```{r, fig.cap=paste("Data set", ds, ":", pct, "% tumor cells")}
pct <- 0.5
plotFUN(ds, pct)
```

## Illumina data

```{r}
ds <- "GSE11976"
```


## Session information
```{r}
sessionInfo()
```


## References
[1] Bellman, Richard. 1961. "On the Approximation of Curves by Line Segments Using Dynamic Programming." Communications of the ACM 4 (6). ACM: 284.

[2] Gey, Servane, et al. 2008. "Using CART to Detect Multiple Change Points in the Mean for Large Sample." https://hal.science/hal-00327146.

[3] Pierre-Jean, Morgane, et al. 2015. "Performance Evaluation of DNA Copy Number Segmentation Methods." Briefings in Bioinformatics, no. 4: 600-615.