---
title: "Tree search with Profile parsimony"
author: "Martin R. Smith"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
csl: ../inst/apa-old-doi-prefix.csl
vignette: >
  %\VignetteIndexEntry{Tree search with Profile parsimony}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Profile Parsimony [@Faith2001] finds the tree that is most faithful 
to the information contained within a given dataset.
It is the 'exact solution' that implied weights parsimony approximates.
For more information on the philosophy and mathematics of profile parsimony,
see the [companion vignette](profile-scores.html).

Profile Parsimony is currently implemented in "TreeSearch" for characters with
up to two parsimony-informative states.
(Further states are treated as ambiguous, whilst retaining as much information
as possible.)

## Getting started
<!--Duplicated from inapplicable.Rmd-->
[A companion vignette](getting-started.html) gives details on installing 
the package and getting up and running.

<!--
# message=FALSE will suppress message when loading TreeSearch
# Temporary fix until R.oo is updated -- remove thereafter
# https://github.com/r-lib/rlang/issues/669
-->

Once installed, load the inapplicable package into R using
```{r load-library, message=FALSE}
library("TreeSearch")
```

In order to reproduce the random elements of this document, set a random seed:

```{R rng-version, warn = FALSE}
# Set a random seed so that random functions in this document are reproducible
RNGversion("3.5.0")
set.seed(888)
```


## Scoring a tree, and conducting a tree search

Here's an example of using the package to conduct tree search with profile 
parsimony.
You can [load your own dataset](https://ms609.github.io/TreeTools/articles/load-data.html),
but for this example,
we'll use a simulated dataset that comes bundled with the `TreeSearch` package.

```{r load-longrich-data}
data(congreveLamsdellMatrices)
myMatrix <- congreveLamsdellMatrices[[10]]
```

Unless a starting tree is provided, tree search will from a random addition
tree:

```{r addition-tree}
additionTree <- AdditionTree(myMatrix, concavity = "profile")
TreeLength(additionTree, myMatrix, "profile")
par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
plot(additionTree)
```

We could alternatively use a random or neighbour-joining tree:

```{r random-tree}
randomTree <- TreeTools::RandomTree(myMatrix, root = TRUE)
TreeLength(randomTree, myMatrix, "profile")
njTree <- TreeTools::NJTree(myMatrix)
TreeLength(njTree, myMatrix, "profile")
```

We search for trees with a better score using TBR rearrangements and the 
parsimony ratchet [@Nixon1999]:

```{R starting-score, message = FALSE}
betterTrees <- MaximizeParsimony(myMatrix, additionTree, concavity = "profile",
                                 ratchIter = 3, tbrIter = 3, maxHits = 8)
```

We've used very low values of `ratchIter`, `tbrIter` and `maxHits` for a rapid
run, so this is not necessarily a thorough enough search to find a globally
optimal tree.
Nevertheless, let's see the resultant tree, and its score:

```{r ratchet-search-results}
TreeLength(betterTrees[[1]], myMatrix, "profile")
par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
plot(ape::consensus(betterTrees))
```

The default parameters may not be enough to find the optimal tree; type 
`?MaximizeParsimony` to view all search parameters --
or keep repeating the search until tree score stops improving.

## View the results

In parsimony search, it is good practice to consider trees that are slightly suboptimal [@Smith2019].

Here, we'll take a consensus that includes all trees that are suboptimal by up
to 3 bits.
To sample this region of tree space well, the trick is to use large values of 
`ratchHits` and `ratchIter`, and small values of `searchHits` and
`searchiter`, so that many runs don't quite hit the optimal tree.
In a serious study, you would want to sample many more than the 3 Ratchet hits (`ratchHits`) we'll settle for here, probably using many more Ratchet iterations.

```{R suboptimal-sampling, message = FALSE}
suboptimals <- MaximizeParsimony(myMatrix, betterTrees, tolerance = 3,
                                 ratchIter = 2, tbrIter = 3,
                                 maxHits = 25,
                                 concavity = "profile")
```

The consensus of these slightly suboptimal trees provides a less resolved, but
typically more reliable, summary of the signal with the phylogenetic dataset
[@Smith2019]:

```{r plot-suboptimal-consensus}
par(mar = rep(0.25, 4), cex = 0.75)
table(signif(TreeLength(suboptimals, myMatrix, "profile")))
plot(ape::consensus(suboptimals))
```


## Where next?

- [Documentation home](https://ms609.github.io/TreeSearch/)

- [Guide to installation](getting-started.html)

- Search for trees using
  - [standard parsimony](tree-search.html) (corrected for inapplicable data)
  - [custom optimality criteria](custom.Rmd)
  
- Explore the distribution of optimal trees in [mappings](tree-space.html)

## References