---
title: "RAINBOWR: Reliable Association INference By Optimizing Weights with R"
author: 
- Kosuke Hamazaki (hamazaki@ut-biomet.org)
- Hiroyoshi Iwata
date: "2022-01-31"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{RAINBOWR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.dpi=96
)
```

## NOTE!!!!
##### The paper for `RAINBOWR` has been published in PLOS Computational Biology (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663). If you use this `RAINBOWR` in your paper, please cite `RAINBOWR` as follows:
- Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.

##### The stable version for `RAINBOWR` package is now available at the [CRAN (Comprehensive R Archive Network)](https://cran.r-project.org/package=RAINBOWR).

##### Please check the change in RAINBOWR with the version update from [NEWS.md](https://github.com/KosukeHamazaki/RAINBOWR/blob/master/NEWS.md).


----------

In this repository, the `R` package `RAINBOWR` is available.
Here, we describe how to install and how to use `RAINBOWR`.

----------
## What is `RAINBOWR`
`RAINBOWR`(Reliable Association INference By Optimizing Weights with R) is a package to perform several types of `GWAS` as follows.

- Single-SNP GWAS by `RGWAS.normal` function
- SNP-set (, haplotype-block, or gene-set) GWAS by `RGWAS.multisnp` function (which tests multiple SNPs at the same time)
- Check epistatic (SNP-set x SNP-set interaction) effects by `RGWAS.epistasis` (very slow and less reliable)
- Single-SNP GWAS including tests for interaction with genetic background by `RGWAS.normal` function
- SNP-set (, haplotype-block, or gene-set) GWAS including tests for interaction with genetic background (or epistatic effects with polygenes) by `RGWAS.multisnp` function (which tests multiple SNPs at the same time)

`RAINBOWR` also offers some functions to solve the linear mixed effects model.

- Solve multi-kernel linear mixed effects model with `EM3.general` function (using `gaston`, `MM4LMM`, or `RAINBOWR` packages; fast for `gaston` and `MM4LMM`)
- Solve one-kernel linear mixed effects model with `EMM.cpp` function
- Solve multi-kernel linear mixed effects model with `EM3.cpp` function (for the general kernel, not so fast)
- Solve multi-kernel linear mixed effects model with `EM3.linker.cpp` function (for the linear kernel, fast)

By utilizing these functions, you can estimate the genomic heritability and perform genomic prediction (`GP`).

Finally, `RAINBOWR` offers other useful functions.

- `qq` and `manhattan` function to draw Q-Q plot and Manhattan plot
- `modify.data` function to match phenotype and marker genotype data
- `CalcThreshold` function to calculate thresholds for GWAS results
- `See` function to see a brief view of data (like `head` function, but more useful)
- `genetrait` function to generate pseudo phenotypic values from marker genotype
- `SS_GWAS` function to summarize GWAS results (only for simulation study)
- `estPhylo` and `estNetwork` functions to estimate phylogenetic tree or haplotype network and haplotype effects with non-linear kernels for haplotype blocks of interest.
- `convertBlockList` function to convert haplotype block list estimated by PLINK to the format which can be inputted as a `gene.set` argument in `RGWAS.multisnp`, `RGWAS.multisnp.interaction`, and `RGWAS.epistasis` functions.


## Installation
The stable version of `RAINBOWR` is now available at the [CRAN (Comprehensive R Archive Network)](https://cran.r-project.org/package=RAINBOWR). The latest version of `RAINBOWR` is also available at the `KosukeHamazaki/RAINBOWR` repository in the [`GitHub`](https://github.com/KosukeHamazaki/RAINBOWR), so please run the following code in the R console.

``` r
#### Stable version of RAINBOWR ####
install.packages("RAINBOWR")  


#### Latest version of RAINBOWR ####
### If you have not installed yet, ...
install.packages("devtools")  

### Install RAINBOWR from GitHub
devtools::install_github("KosukeHamazaki/RAINBOWR")
```

If you get some errors via installation, please check if the following packages are correctly installed.
(We removed a dependency on `rgl` package!)



- Rcpp      # install `Rtools` for Windows user
- plotly    # Suggests
- Matrix
- cluster
- MASS
- pbmcapply
- optimx
- methods
- ape
- stringr
- pegas
- ggplot2     # Suggests
- ggtree      # Suggests, install from Bioconducter with `BiocManager::install("ggtree")`
- scatterpie   # Suggests
- phylobase    # Suggests
- haplotypes   # Suggests
- rrBLUP
- expm
- here         # Suggests
- htmlwidgets
- Rfast
- adegenet     # Suggests
- gaston
- MM4LMM
- furrr        # Suggests
- future       # Suggests
- progressr    # Suggests
- foreach      # Suggests, but stongly recommend the installation for Windows user to parallel
- doParallel   # Suggests
- data.table   # Suggests


In `RAINBOWR`,  since part of the code is written in `Rcpp` (`C++` in `R`),  please check if you can use `C++` in `R`.
For `Windows` users,  you should install [`Rtools`](https://cran.r-project.org/bin/windows/Rtools/).

If you have some questions about installation, please contact us by e-mail (hamazaki@ut-biomet.org).


##  Usage
First, import `RAINBOWR` package and load example datasets. These example datasets consist of marker genotype (scored with {-1, 0, 1}, 1,536 SNP chip (Zhao et al., 2010; PLoS One 5(5): e10780)), map with physical position, and phenotypic data (Zhao et al., 2011; Nature Communications 2:467). Both datasets can be downloaded from `Rice Diversity` homepage (http://www.ricediversity.org/data/). Also, the dataset includes a list of haplotype blocks from the version 0.1.30. This list was estimated by the PLINK 1.9 (Taliun et al., 2014; BMC Bioinformatics, 15).

``` {r, include=TRUE}
### Import RAINBOWR
require(RAINBOWR)

### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
Rice_haplo_block <- Rice_Zhao_etal$haploBlock

### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)
See(Rice_haplo_block)
```
You can check the original data format by `See` function.
Then, select one trait (here, `Flowering.time.at.Arkansas`) for example.

``` {r, include=TRUE}
### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]
```

For  GWAS, first you can remove  SNPs whose MAF <= 0.05 by `MAF.cut` function.

``` {r, include=TRUE}
### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map
```

Next, we estimate additive genomic relationship matrix (GRM) by using `calcGRM` function.

``` {r, include=TRUE}
### Estimate genomic relationship matrix (GRM) 
K.A <- calcGRM(genoMat = x)
```

Next, we modify these data into the GWAS format of `RAINBOWR` by `modify.data` function.

``` {r, include=TRUE}
### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                               return.ZETA = TRUE, return.GWAS.format = TRUE)
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA

### View each data for RAINBOWR
See(pheno.GWAS)
See(geno.GWAS)
str(ZETA)
```
`ZETA` is a list of genomic relationship matrix (GRM) and its design matrix.

Finally, we can perform `GWAS` using these data.
First, we perform single-SNP GWAS by `RGWAS.normal` function as follows.

``` {r, include=TRUE}
### Perform single-SNP GWAS
normal.res <- RGWAS.normal(pheno = pheno.GWAS, geno = geno.GWAS,
                           plot.qq = FALSE, plot.Manhattan = FALSE,
                           ZETA = ZETA, n.PC = 4, P3D = TRUE, 
                           skip.check = TRUE, count = FALSE)
See(normal.res$D)  ### Column 4 contains -log10(p) values for markers
```

``` {r, echo=FALSE}
qq(normal.res$D[, 4])
manhattan(normal.res$D)
### Automatically draw Q-Q plot and Manhattan if you set plot.qq = TRUE and plot.Manhattan = TRUE.
```

Next, we perform SNP-set GWAS by `RGWAS.multisnp` function.

``` {r, include=TRUE, message=FALSE}
### Perform SNP-set GWAS (by regarding 11 SNPs as one SNP-set, first 300 SNPs)
SNP_set.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS[1:300, ], ZETA = ZETA,
                              plot.qq = FALSE, plot.Manhattan = FALSE, count = FALSE,
                              n.PC = 4, test.method = "LR", kernel.method = "linear",
                              gene.set = NULL, skip.check = TRUE, 
                              test.effect = "additive", window.size.half = 5, window.slide = 11)

See(SNP_set.res$D)  ### Column 4 contains -log10(p) values for markers
```


``` {r, echo=FALSE}
qq(SNP_set.res$D[, 4])
manhattan(SNP_set.res$D)
### Automatically draw Q-Q plot and Manhattan if you set plot.qq = TRUE and plot.Manhattan = TRUE.
```


You can perform SNP-set GWAS with sliding window by setting `window.slide = 1`.
You can perform SNP-set GWAS with sliding window by setting `window.slide = 1`.
And you can also perform gene-set (or haplotype-block based) GWAS by assigning the following data set to `gene.set` argument. (You can check the example also by `See(Rice_haplo_block)` in R.)

ex.)

|  gene (or haplotype block)   |  marker | 
| :-----: | :------:| 
| haploblock_1    | id1005261 | 
| haploblock_1    | id1005263 | 
| haploblock_2    | id1009557 | 
| haploblock_2    | id1009616 | 
| haploblock_3    | id1020154 | 
| ...    | ... | 



``` {r, include=TRUE, message=FALSE}
### Perform haplotype-block based GWAS (by using hapltype blocks estimated by PLINK, first 400 SNPs)
haplo_block.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS[1:400, ], ZETA = ZETA,
                              plot.qq = FALSE, plot.Manhattan = FALSE, count = FALSE,
                              n.PC = 4, test.method = "LR", kernel.method = "linear", 
                              gene.set = Rice_haplo_block, skip.check = TRUE, 
                              test.effect = "additive")

See(haplo_block.res$D)  ### Column 4 contains -log10(p) values for markers
```


``` {r, echo=FALSE}
qq(haplo_block.res$D[, 4])
manhattan(haplo_block.res$D)
### Automatically draw Q-Q plot and Manhattan if you set plot.qq = TRUE and plot.Manhattan = TRUE.
```

There is no significant block for this dataset because the number of markers and blocks is too small for this dataset. However, when whole-genome sequencing data is available, the impact of using SNP-set/gene-set/haplotype-block methods becomes larger and we strongly recommend you use these methods. Please see Hamazaki and Iwata (2020, PLOS Comp Biol) for more details of the features of these methods.


### Help
If you have some help before performing `GWAS` with `RAINBOWR`, please see the help for each function by `?function_name`.


## References
Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.