---
title: "How we generated our prediction for subchallenge 3"
date: "`r Sys.Date()`"
author: "Il-Youp Kwak and Wuming Gong"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{prediction for subchallenge 3}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r knitr_options, echo=FALSE, results=FALSE}
library(knitr)
opts_chunk$set(fig.width = 12)
```

```{r loading, include=FALSE}
library(DCLEAR)
library(phangorn)
#library(parallel)

```

This vignettes illustrate how our team prepared a submission for subchallenge 3. 

## Data processing


The 'csv_file' is a file path of evaluation for subchallenge 3 given from the competition.
Change the like with the one you would like to predict.

```
csv_file <- 'Data/subC3/SubC3_10K_0001_mutation_table.csv'
\dontrun{x <- read.table(csv_file, header = T, sep = ',', colClasses = "character")}
```

Initical state is '0', interval dropout is '-', point dropout is '*' (point dropout was '' from the file, but we will replace it with '*'), and mutational outcome states are 'A' to 'Z'. 

```
\dontrun{states <- c('0', '-', '*', LETTERS)}
```

Read file and save it as 'phyDat' object.
```
\dontrun{tip_names <- x[, 1]}
\dontrun{x <- x[,-1]}
\dontrun{rownames(x) <- tip_names}
\dontrun{x[ x == '' ] = '*'  ## specified * as point dropout (point missing)} 
\dontrun{x = as.matrix(x)}
\dontrun{x <- x %>% phyDat(type = 'USER', levels = states)}

\dontrun{states2num = 1:length(states)}
\dontrun{names(states2num) = states}
```

## Weight parameters for the prediction

We tried weighted hamming I and II with large number of parameter combinations, and found weighted hamming I with weight below worked fairly well. 

```
InfoW = 1:5
InfoW[1] = 1  ## Score 
InfoW[2] = .9
InfoW[3] = .4
InfoW[4:26] = 3
```

## Generating final submission file for subchallenge 3

```
\dontrun{aTree2 <- x %>% dist_weighted_hamming(InfoW, FALSE) %>% fastme.ols()}
\dontrun{write.tree(aTree2, "Kwak_Gongsub3_submission.nw")}

```

Thanks!