---
title: "Simultaneous Selection by Trait and WAASB Index"
author: "Ali Arminian"
date: "`r Sys.Date()`"
always_allow_html: yes
output: rmarkdown::html_vignette
fig_caption: yes
link-citations: true
toc: true
bibliography: rywaasbref.bib
vignette: >
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{rYWAASB-vignette}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = '70%'
)
```


## 1-Introduction
`I dedicate this package to my colleague, Professor Dr. Manjit S. Kang, a great biostatistician and quantitative geneticist, who passed away recently and has devoted his life to PLANT BREEDING science and has provided me with invaluable supports!.`

 **Stability analysis** is crucial in plant breeding to select superior genotypes that perform consistently across different environments. Several models and methods have been developed for this purpose, including Additive Main Effect and Multiplicative Interaction (*AMMI*), Weighted Average of Absolute Scores (*WAASB*), and Genotype plus Genotype-Environment (*GGE*) interactions in multi-environmental trials (*MET*) [@mishra2024; @sakata2021; @pour2022; @danakumara2023]. These analyses help breeders identify genotypes with stable performance and adaptability under varying conditions, leading to optimal yield stability and successful crop production. Utilizing these stability analysis tools, breeders can navigate the complexities of genotype-environment interactions and select genotypes that consistently excel across different locations and seasons, ensuring the development of resilient and high-performing plant varieties [@swarup2014].

## The WAASB and WAASBY objects
By combining the strengths of AMMI for assessing stability and BLUP for prediction accuracy, breeders can effectively select genotypes that consistently perform well across different environmental conditions. This is crucial for developing sustainable agricultural systems and improving food security. To estimate the stability index of genotypes in multi-environment trials (METs), the **WAASB** index (weighted average of the absolute values obtained from the singular value decomposition of the BLUP matrix for the genotype by environment interaction effects, generated by a linear mixed-effects model) is calculated using the formula provided by [@olivoto2019]. They suggest that the function `*waasb()*` computes the Weighted Average of the Absolute Scores considering all possible IPCA from the Singular Value Decomposition of the BLUPs for genotype-vs-environment interaction effects obtained by an Linear Mixed-effect Model [@olivoto2019]
, as follows:

$$ WAASB_i  = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k $$

where $WAASB_i$ is the weighted average of absolute scores of the *i*th genotype; $IPCA_{ik}$ is the scores of the *i*th genotype in the *k*th IPCA; and $EP_k$ is the explained variance of the *k*th PCA for $k = 1,2,..,p$, $p = min(g-1; e-1)$.

Further, **WAABSY** is a superiority or simultaneous selection index allowing weighting between mean performance and stability [@olivoto2019]: 

$$ WAASBY_i=\frac{\left({rY}_i\times\theta_Y\right)+\left({rW}_i\times\theta_s\right)}{\theta_Y+\theta_s} $$
where WAASBYi is the superiority index for genotype \it{i} that weights between mean performance and stability; $\theta_Y$ and $\theta_s$ are the weights for mean performance and stability, respectively; \{rY}_i and \{rW}_i are the rescaled values for mean performance $\bar{Y_i}$ and stability $W_i$, respectively of the genotype i. For the details of calculations, rescalling and mathematics notations see [@olivoto2019].

In this package, my new index (*rYWAASB*) gives results which can be compared with *WAASBY* index provided by Olivoto et al. [@olivoto2019].

For working with *rYWAASB* package, firstly, if the *metan* or *rYWAASB* packages are not already installed, they should be installed on the system. The analysis requires the following packages to be installed: 

```{r Installing rYWAASB package}
if(!require('rYWAASB')){
    install.packages('rYWAASB') # call the package
}
library('rYWAASB')
```

The analyses requires the following R packages:
```{r setup,warning=FALSE,message=FALSE}
## For graphical displays
library(metan)
library(ggplot2)
library(graphics)
library(factoextra)
library(FactoMineR)
```

The codes provided below form the *metan* package, allow you to access the *WAASB* index values, rankings, and other information for genotypes (or entries) in general.


```{r  }
waasb_model <- 
  waasb(data_ge,
        env = ENV,
        gen = GEN,
        rep = REP,
        resp = everything(),
        random = "gen", #Default
        verbose = TRUE) #Default

data <- waasb_model$GY$model
print(data)
```

The output generated by the *waasb()* function is very similar to that generated by the *waas()* function. The main difference is that the singular value decomposition is based on the BLUP for GEI effects matrix. For more information, refer to [@olivoto2019].


Several indexes have been developed to identify genotypes that exhibit both high performance and stability in plant breeding programs. One example is the *kangranksum* index, which was developed by Kang [@kang1988]. This index combines yield and stability ranks based on the *Shukla* stability index. Olivoto [@olivoto2019], on the other hand, created the *WAASBY* index by assigning weights to yield and stability. Our rYWAASB index can be compared to these earlier indexes, as it follows a similar computational approach. However, our index (*rYWAASB*) is a powerful tool, because it incorporates a trait and *WAASB* rankings in the process.


## 2- How to apply the package
First, let's examine the `Y*WAASB` biplot generated by the *metan* package. This will allow us to compare the results of the *rYWAASB* package with the `Y*WAASB` biplot. In the `Y*WAASB` or `GY*WAASB` biplot proposed by [@olivoto2019] (Fig. 1), the quadrants illustrate the stability and trait patterns (specifically, the grain yield of oat genotypes in the data_ge dataset) as follows:

- Quadrant I: This quadrant consists of unstable genotypes or environments that have high discrimination ability but productivity below the grand mean.
- Quadrant II: This quadrant includes unstable genotypes with productivity above the grand mean.
- Quadrant III: This quadrant comprises low productive but stable genotypes.
- Quadrant IV: This quadrant contains productive and broadly adapted genotypes.


```{r  echo = TRUE, fig.height = 14, fig.width = 18, fig.align = "center", message=F, warning=F}
plot_scores(waasb_model, type = 3)
```

Fig. 1: Y*WAASB biplot of `metan` package built-in oat data


In this section, we will utilize the built-in data *maize* to generate ranking scores for different genotypes, along with their corresponding plots. For further details, please refer to the *?maize* documentation. It is also possible to use other datasets as long as they contain the following columns: genotype, trait, and WAASB index for genotypes. To understand how the HTML tables were created, please refer to the [Rendering engine](#rendering) section.


```{r Showing the maize dataset}
data(maize)
head(maize)
```

## Applying the **rYWAASB** package. Ranking table:
We recommend that users address (handle/overcome/substitute) any missing data in their inputs before proceeding with analyses. This is because the rank codes do not incorporate a comprehensive algorithm to handle this task.

Case 1. Higher trait values are preferred. For instance grain yield in cereals is a trait which its higher values are preferred and ranking is performed from the higher to lower values i.e. 1st, 2nd, 3rd etc in maize dataset.

```{r apply package by ranking the genotypes for maize data}
data(maize)
ranki(maize) # or: ranki(maize, lowt = FALSE)
```
In the table above, the genotype with the lowest rank (*Dracma*) is considered the best due to its high grain yield and low *WAASB* score.

Case 2. 
In this case, the lower values of the given trait are preferred. For instance days to maturity and plant height are traits where their lower values are preferred.

```{r apply package by ranking the genotypes for dm data}
data(dm)
ranki(dm, lowt = TRUE)
```

## Drawing the first plot *bar_plot1*:
case 1. *maize* dataset:
```{r  echo = TRUE, fig.height = 14, fig.width = 20, fig.align = "center", message=F, warning=F}
data(maize)
bar_plot1(maize) # or: bar_plot1(maize, lowt = FALSE)
```

case 2. *dm* dataset:
```{r  echo = TRUE, fig.height = 14, fig.width = 20, fig.align = "center", message=F, warning=F}
data(dm)
bar_plot1(dm, lowt = TRUE)
```


Fig.2. The first barplot of the `rYWASSB` package


## Drawing the second plot *bar_plot2*:
```{r  echo = TRUE, fig.height = 14, fig.width = 20, fig.align = "center", message=F, warning=F}
data(maize)
bar_plot2(maize) # or: bar_plot2(maize, lowt = FALSE, verbose = FALSE)
```

Fig.3. The second barplot of the `rYWASSB` package

## Performing PCA and textual and graphical representations (scree plot and PCA-dependent biplots (3 plots))
case 1. For grain yield data:

```{r  echo = TRUE, fig.height = 14, fig.width = 20, fig.align = "center", message=F, warning=F}
data(maize)
PCA_biplot(maize) # or: PCA_biplot(maize, lowt = FALSE)
```

Figs.4-6. The PCA biplot with loadings for compare genotypes

Case 2. For days to maturity data:
```{r  echo = TRUE, fig.height = 14, fig.width = 20, fig.align = "center", message=F, warning=F}
data(dm)
PCA_biplot(dm, lowt = TRUE)
```

Figs.7-9. The PCA biplot with loadings for compare genotypes

## Designing the cluster analysis on data and define the optimum number of clusters uning `The Average Silhouette Method`. The details of this method is given in the *description* and *details* parts of the function `nbclust()`. For running the cluster analysis by menas of `shipunov` package (courtesy of late professor Alexy Shipunov) act as:



```{r  echo = TRUE, fig.height = 15, fig.width = 30, fig.align = "center", message=F, warning=F}
 data(maize)
 maize <- as.data.frame(maize)
 row.names(maize) <- maize[, 1]
 maize[, 1] = NULL
 GEN <- row.names(maize)
 maize <- scale(maize)
 nbclust(maize, verbose = FALSE)

 # Perform bootstrap or jackknife clustering by shipunov package. 
 # The examples should be run in the console manually due to 
 # problems occurs in the ORPHANED package "shipunov".
 #
 # library(shipunov) # recalling the shipunov package
 # 1- Bootstrap clustering:
 # data.jb <- Jclust(maize,
 #  method.d = "euclidean",
 #   method.c = "average", n.cl = 2,
 #    bootstrap = TRUE)
 #
 # plot.Jclust(data.jb, top=TRUE, lab.pos=1,
 #   lab.offset=1, lab.col=2, lab.font=2)
 # Fence(data.jb$hclust, GEN)
 #
 # data.jb <- Jclust(maize,
 #  method.d = "euclidean",
 #   method.c = "ward.D", n.cl = 2,
 #    bootstrap = TRUE)
 #
 # plot.Jclust(data.jb, top=TRUE, lab.pos=1,
 #  lab.offset=1, lab.col=2, lab.font=2)
 # Fence(data.jb$hclust, GEN)
 #
 #
 # if(verbose = TRUE):
 # cat("\nnumber of iterations:\n", data.jb$iter, "\n")
 #
 # For "bootstrap":
 # data.jb$mat <- as.matrix((data.jb$mat))
 # cat("\nmatrix of results:\n", data.jb$mat, "\n")
 # cat("clustering info, by eucledean distance measure:\n")
 # print(data.jb$hclust)
 # cat("groups:\n", data.jb$gr, "\n")
 # cat("\nsupport values:\n", data.jb$supp, "\n")
 # cat("\nnumber of clusters used:\n", data.jb$n.cl, "\n")
 #
 # 2- Jackknife clustering:
 # data.jb <- Bclust(maize,
 #   method.d = "euclidean", method.c = "average",
 #    bootstrap = FALSE)
 # plot(data.jb)
 #
 # data.jb <- Bclust(maize,
 #   method.d = "euclidean", method.c = "ward.D",
 #    bootstrap = FALSE)
 # plot(data.jb)
 #
 # if(verbose = TRUE):
 # For"jackknife":
 # cat("Consensus:\n", data.jb$consensus, "\n")
 # cat("Vlaues:\n", data.jb$values, "\n")
```


Figures 10-12 Cluster analysis by 1000 bootstrap iterations and determine optimum N clusters using The Average Silhouette algorithm.

# References