---
title: "Walkthrough"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Walkthrough}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width = 6)
```

# GenomeAdmixR
The package GenomeAdmixR was designed to simulate the generation of iso-female
lines, and simulate genomic changes during, and after, formation of iso-female 
lines. Furthermore, it allows for simulation of effects after crossing 
individuals from iso-female lines.

```{r}
library(GenomeAdmixR)
library(ggplot2)
packageVersion("GenomeAdmixR")
```


## Simulating an iso-female line
In order to create an iso-female line, we first have to create a 'wild' 
population, from which we will draw individuals. To create such a population, we 
make use of the function 'simulate_admixture', using the ancestry module:
```{r create wildpop}
wildpop <-  simulate_admixture(
                    module = ancestry_module(number_of_founders = 4,
                                             morgan = 1),
                    pop_size = 100,
                    total_runtime = 1000)

```
This creates a population of 100 diploid individuals (e.g. 2N = 200), based upon 
10 founders (e.g. 10 individuals that can be genetically distinguished from each 
other). Then, for 1000 generations, this population is inbred and genomes of the 
20 founders are allowed to mix. All individuals have 2 chromosomes (for 
simplicity and computational speed, we only model one chromosome pair), which 
are 1 Morgan long. The result can be written to file (optional), which can be 
usefull when generating for instance an extremely large wild population, which 
can be used later (e.g. read from file).

The result of the function is a structure containing 1000 individuals:
```{r}
wildpop
```
Each individual has two chromosomes, with a given number of junctions (e.g. 
delineations between two contiguous genomic stretches from different ancestors):
```{r}
wildpop[[1]]
```

In order to create an iso-female line, two random individuals are drawn from the 
wild population, and selected for inbreeding. We can simulate this process with:
```{r create isofemale}
isofemale <- create_iso_female(
                  module = ancestry_module(input_population = wildpop,
                                           morgan = 1),
                  n = 1,
                  inbreeding_pop_size = 1000,
                   run_time = 200)
```

Here, n indicates the number of isofemales to be created from the same source 
population, the inbreeding population size is set to be small, in order to 
speed up computation. Maximum run time is best set hight, to assure that all 
individuals become genetically identical. 

# Visualizing individuals
Now, we can plot the isofemales, this plots the two chromosomes next to each 
other, where colors indicate different ancestors:
```{r}
plot(isofemale[[1]])
```

Because we have chosen 4 ancestors, these plots are not terribly informative, 
let's try a toy example:
```{r toyexample}
wildpop <-  simulate_admixture(
                  module = ancestry_module(number_of_founders = 4,
                                           morgan = 1),
                  pop_size = 100,
                  total_runtime = 100)

isofemale <- create_iso_female(
                  module = ancestry_module(input_population = wildpop,
                                           morgan = 1),
                  n = 1,
                  inbreeding_pop_size = 100,
                  run_time = 10000)
plot(wildpop$population[[1]])
plot(isofemale[[1]])
```

We can also plot a (section of) a single chromosome:
```{r plot selection of chromosome}
plot_chromosome(isofemale[[1]]$chromosome1, xmin = 0.0, xmax = 0.5)
```

# Multiple source populations

It gets more interesting if we create two populations, and draw iso-females from 
them, and cross these.
First, we have to create two populations. 
```{r create two populations}
both_populations <-
  simulate_admixture(
    module = ancestry_module(morgan = 1),
    migration = migration_settings(population_size = c(100, 100),
                        migration_rate = 0.0,
                        initial_frequencies =
                                 list(c(rep(1, 20), rep(0, 20)),
                                      c(rep(0, 20), rep(1, 20)))
                      ),
    total_runtime = 1000)

population_1 <- both_populations$population_1

population_2 <- both_populations$population_2
```
To make sure that the two populations dont share ancestor indices, we use the 
function 'increase_ancestor'. 

Now that we have two populations, we can generate two different iso-females 
from them:
```{r draw two isofemales}
isofemales <- create_iso_female(
                  module = ancestry_module(input_population = population_1,
                                           morgan = 1),
                  n = 2,
                  inbreeding_pop_size = 100,
                  run_time = 10000)

plot_chromosome(isofemales[[1]]$chromosome1, 0, 1)
plot_chromosome(isofemales[[2]]$chromosome1, 0, 1)
```

We can now use these two isofemales to seed a new inbreeding population:
```{r seed mixed population}
mixed_population <-
  simulate_admixture(
        module = ancestry_module(input_population = list(isofemales[[1]],
                                                         isofemales[[2]]),
                                 morgan = 1),
        pop_size = 100, 
        total_runtime = 100)
```
And plot some individuals from our new population:
```{r plot mixed_population}
plot(mixed_population$population[[1]])
```

# Statistics

## FST
We can show the effect of overlap by calculating the Fst value:
```{r calc FST}
fst <- calculate_fst(population_1,
                 population_2,
                 sampled_individuals = 10,
                 number_of_markers = 100,
                 random_markers = TRUE)
fst
```
The FST calculation function uses the library hierfstat to calculate the FST. To 
do so, it samples 10 random individuals (the parameter sampled individuals) from 
the population, and then assesses the genetic content at 100 randomly placed 
markers (number of markers, and random_markers = TRUE). To increase power, a 
higher number of individuals can be sampled, although this does increase 
computational load.

## LD
We can also calculate LD statistics. LD is only defined for markers, so again we 
have to simulate artificial markers along the chromosome. 

```{r}
  ld_results <- calculate_ld(wildpop,
                             markers = 10)

  plot(ld_results$ld_matrix~ld_results$dist_matrix,
       xlab = "Genetic Distance (Morgan)",
       ylab = "LD",
       pch = 16)
  plot(ld_results$rsq_matrix~ld_results$dist_matrix,
       xlab = "Genetic Distance (Morgan)",
       ylab = "r_sq",
       pch = 16)
  plot(ld_results$ld_matrix~ld_results$rsq_matrix,
       xlab = "r_sq",
       ylab = "LD",
       pch = 16)
```

To analyze LD patterns better, it makes more sense to create a population from 
scratch. We expect for a strongly inbred population, that there is no LD:

```{r no LD}
no_ld_pop <- simulate_admixture(
                  module = ancestry_module(number_of_founders = 4,
                                           morgan = 1),
                  pop_size = 100,
                  total_runtime = 1000)

ld_results <- calculate_ld(no_ld_pop,
                           sampled_individuals = 10,
                           markers = 10)

plot(ld_results$ld_matrix~ld_results$dist_matrix,
     pch = 16,
     xlab = "Distance",
     ylab = "LD",
     xlim = c(0, 1),
     ylim = c(0, 1))
```

Alternatively, for a barely inbred population, we expect a negative relationship 
between LD and distance:

```{r strong LD}
strong_ld_pop <- simulate_admixture(
                  module = ancestry_module(number_of_founders = 4,
                                           morgan = 1),
                  pop_size = 100,
                  total_runtime = 10)

ld_results <- calculate_ld(strong_ld_pop,
                           sampled_individuals = 10,
                           markers = 10)

plot(ld_results$ld_matrix~ld_results$dist_matrix,
     pch = 16,
     xlab = "Distance",
     ylab = "LD",
     xlim = c(0, 1),
     ylim = c(0, 1))
```

# Selection
Selection for specific markers (e.g. SNPs) can potentially cause local allelel 
frequency biases, and influence genetic patterns within the population as a 
whole.

The user can provide a selection 'matrix' that specifies how selection acts upon 
different genotypes. As genotypes we indicate here 'aa' (wildtype), 
'aA' (heterozygote) and 'AA' (homozygote with allele under selection), where 
wildtype is any allele not under selection. As alleles under selection we take 
the general genomic content of ancestors, and hence alleles refer to anestors. 

A perhaps interesting simulation would be one where the heterozygote has an 
advantage over the other genotypes
```{r heterozygote selection}
s <- 0.1
selection_matrix <- matrix(nrow = 1, ncol = 5)
selection_matrix[1, ] <- c(0.5,
                           1.0, 1.0 + s, 1.0,
                           0)

markers <- 0.5

selected_pop <- simulate_admixture(
                    module = ancestry_module(number_of_founders = 2,
                                             morgan = 1,
                                             markers = markers),
                    pop_size = 1000,  
                    total_runtime = 100,
                    select_matrix = selection_matrix)

plot_over_time(selected_pop$frequencies, markers)
```

Indeed we observe that the frequencies of both ancestors are in equilibrium around 0.5, as expected.

Another interesting simulation would be where we give only the homozygous mutant a selective benefit. Here we expect that it takes some time before the homozygote takes over.

```{r homozygote selection}
s <- 0.1
selection_matrix[1, ] <- c(0.5,
                           1.0, 1.0, 1.0 + s,
                           0)

selected_pop <- simulate_admixture(
                    module = ancestry_module(number_of_founders = 10,
                                             morgan = 1,
                                             markers = markers),
                    pop_size = 1000,  
                    total_runtime = 300,
                    select_matrix = selection_matrix)

plot_over_time(selected_pop$frequencies, markers)
```