---
title: "RScelestial Vignette"
author: "Mohammad-Hadi Foroughmand-Araabi"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{RScelestial Vignette}
  %\VignetteEngine{knitr::rmarkdown}
  \VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(RScelestial)
# We load igraph for drawing trees. If you do not want to draw,
# there is no need to import igraph.
library(igraph)
```

# Installing RScelestial
The RScelestial package could be installed easily as follows
```{r eval=FALSE}
install.packages("RScelestial")
```

# Simulation
Here we show a simulation. We build a data set with following command. 
```{r}
# Following command generates ten samples with 20 loci. 
# Rate of mutations on each edge of the evolutionary tree is 1.5. 
D = synthesis(10, 20, 5, seed = 7)
D
```


# Inferring the phylogenetic tree

```{r run-scelestial-0}
seq = as.ten.state.matrix(D$seqeunce)
SP = scelestial(seq, return.graph = TRUE)
SP
```

You can draw the graph with following command
```{r fig.width=5, fig.height=5}
tree.plot(SP, vertex.size = 30)
```

Also, we can make a rooted tree with cell "C8" as the root of the tree as follows:
```{r fig.width=5, fig.height=5}
SP = scelestial(seq, root.assign.method = "fix", root = "C8", return.graph = TRUE)
tree.plot(SP, vertex.size = 30)
```

Setting root.assign.method to "balance" lets the algorithm decide for a root
that produces minimum height tree.
```{r fig.width=5, fig.height=5}
SP = scelestial(seq, root.assign.method = "balance", return.graph = TRUE)
tree.plot(SP, vertex.size = 30)
```

# Evaluating results
Following command calculates the distance array between pairs of samples.
```{r}
D.distance.matrix <- distance.matrix.true.tree(D)
D.distance.matrix
SP.distance.matrix <- distance.matrix.scelestial(SP)
SP.distance.matrix
## Difference between normalized distance matrices
vertices <- rownames(SP.distance.matrix)
sum(abs(D.distance.matrix[vertices,vertices] - SP.distance.matrix))
```

# Running Scelestial on multiple sequence alignment
Given a multiple sequence alignment, Scelestial infers the phylogeny of them. Here we present a simple example.
First we load libraries to load a multiple alignment.
```{r load-libraries}
library(stringr)
if (!require("seqinr")) install.packages("seqinr")
library(seqinr)
```

In this example, we load a multiple alignment from seqinr package.

```{r load-data}
data(phylip, package = "seqinr")

```

Then we clean the data and build a zero-one matrix representing taxa and characters. Note that Scelestial accept matrices with taxa as its columns and characters as its rows.  
```{r data-cleaning}
# Removing non-informative columns and duplicate rows.
mcb <-  toupper(t(sapply(seq(phylip$seq), function(i) unlist(strsplit(phylip$seq[[i]], '')))))
ccb <- as.character(phylip$seq)
occb <- order(ccb)
cbColMask <- sapply(seq(ncol(mcb)), function(j) length(levels(as.factor(mcb[,j]))) == 1)
cbRowMask <- rep(TRUE, length(ccb))
for (i in seq(length(ccb))) {
    if (i == 1 || ccb[occb[i]] != ccb[occb[i-1]]) {
        cbRowMask[occb[i]] <- FALSE
    }
}
mcbRows <- apply(mcb[!cbRowMask, !cbColMask], MARGIN = 1, FUN = function(a) paste0(str_replace(a, "-", "X"), collapse = ""))

```

Executing Scelestial on the input matrix.
```{r run-scelestial}
n.seq <- data.frame(nodes = phylip$nam[!cbRowMask], seq = mcbRows)
seq2 <- data.frame(t(as.ten.state.matrix.from.node.seq(n.seq)), stringsAsFactors = TRUE)
# Running Scelestial
SP = scelestial(seq2, return.graph = TRUE)
```

```{r plot}
tree.plot(SP, vertex.size=20, vertex.label.dist=0, asp = 0, vertex.label.cex = 1)
```