---
title: "Workflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

This vignette is a workflow template for data import and downstream analysis with **mpwR** including highlighting number of identifications, data completeness, quantitative and retention time precision etc. It demonstrates significant steps and showcases functions and applicability.

## Loading R packages

```{r setup, message = FALSE, warning = FALSE}
library(mpwR)
library(flowTraceR)
library(magrittr)
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
library(ggplot2)
library(flextable)
```

# Import 

## Import your data
Importing the output files from each software can be performed with `prepare_mpwR`. Please put all output files in one folder and follow the guidelines for naming the files. No other files/subfolders are allowed. Details are provided in the vignette [Import](https://okdll.github.io/mpwR/articles/Import.html).

```{r import, eval = FALSE}
files <- prepare_mpwR(path = "Path_to_Folder_with_files")
```

## Examples
Some examples are provided to explore the workflow with `create_example`.

```{r get-example-data}
files <- create_example()
```

# Number of Identifications

## Report
The number of identifications can be determined with `get_ID_Report`. 
```{r ID-Report}
ID_Reports <- get_ID_Report(input_list = files)
```
<p>&nbsp;</p>
For each analysis an ID Report is generated and stored in a list. Each ID Report entry can be easily accessed:
```{r show-ID-Report}
flextable::flextable(ID_Reports[["DIA-NN"]])
```
<p>&nbsp;</p>
## Plot

### Individual
Each ID Report can be plotted with `plot_ID_barplot` from precursor- to proteingroup-level. The generated barplots are stored in a list.
```{r plot-ID-barplot}
ID_Barplots <- plot_ID_barplot(input_list = ID_Reports, level = "ProteinGroup.IDs")
```
<p>&nbsp;</p>
The individual barplots can be easily accessed:
```{r show-ID-barplot}
ID_Barplots[["DIA-NN"]]
```
<p>&nbsp;</p>
### Summary
As a visual summary a boxplot can be generated with `plot_ID_boxplot`.
```{r plot-ID-boxplot}
plot_ID_boxplot(input_list = ID_Reports, level = "ProteinGroup.IDs")
```
<p>&nbsp;</p>
# Data Completeness

## Report
Data Completeness can be determined with `get_DC_Report` for absolute numbers or in percentage. 

```{r DC-Report}
DC_Reports <- get_DC_Report(input_list = files, metric = "absolute")
DC_Reports_perc <- get_DC_Report(input_list = files, metric = "percentage")
``` 
<p>&nbsp;</p>
For each analysis a DC Report is generated and stored in a list. Each DC Report entry can be easily accessed:
```{r show-DC-Report}
flextable::flextable(DC_Reports[["DIA-NN"]])
```
<p>&nbsp;</p>
## Plot

### Individual
#### Absolute
Each DC Report can be plotted with `plot_DC_barplot` from precursor- to proteingroup-level. The generated barplots are stored in a list.
```{r plot-DC-barplot}
DC_Barplots <- plot_DC_barplot(input_list = DC_Reports, level = "ProteinGroup.IDs", label = "absolute")
```
<p>&nbsp;</p>
The individual barplots can be easily accessed:
```{r show-DC-barplot}
DC_Barplots[["DIA-NN"]]
```
<p>&nbsp;</p>
#### Percentage
```{r show-DC-barplot-percentage}
plot_DC_barplot(input_list = DC_Reports_perc, level = "ProteinGroup.IDs", label = "percentage")[["DIA-NN"]]
```
<p>&nbsp;</p>
### Summary
As a visual summary a stacked barplot can be generated with `plot_DC_stacked_barplot`.

#### Absolute
```{r plot-DC-stacked-barplot}
plot_DC_stacked_barplot(input_list = DC_Reports, level = "ProteinGroup.IDs", label = "absolute")
```
<p>&nbsp;</p>
#### Percentage
```{r plot-DC-stacked-barplot-percentage}
plot_DC_stacked_barplot(input_list = DC_Reports_perc, level = "ProteinGroup.IDs", label = "percentage")
```
<p>&nbsp;</p>
# Missed Cleavages

## Report
A report for Missed Cleavages can be generated with `get_MC_Report` for absolute numbers or in percentage. 
```{r MC-Report}
MC_Reports <- get_MC_Report(input_list = files, metric = "absolute")
MC_Reports_perc <- get_MC_Report(input_list = files, metric = "percentage")
``` 
<p>&nbsp;</p>
For each analysis a MC Report is generated and stored in a list. Each MC Report entry can be easily accessed:
```{r show-MC-Report}
flextable::flextable(MC_Reports[["Spectronaut"]])
```
<p>&nbsp;</p>
## Plot

### Individual

#### Absolute
Each MC Report can be plotted with `plot_MC_barplot` from precursor- to proteingroup-level. The generated barplots are stored in a list.
```{r plot-MC-barplot}
MC_Barplots <- plot_MC_barplot(input_list = MC_Reports, label = "absolute")
```
<p>&nbsp;</p>
The individual barplots can be easily accessed:
```{r show-MC-barplot}
MC_Barplots[["Spectronaut"]]
```
<p>&nbsp;</p>
#### Percentage
```{r show-MC-barplot-percentage}
plot_MC_barplot(input_list = MC_Reports_perc, label = "percentage")[["Spectronaut"]]
```
<p>&nbsp;</p>
### Summary
As a visual summary a stacked barplot can be generated with `plot_MC_stacked_barplot`.

#### Absolute
```{r plot-MC-stacked-barplot}
plot_MC_stacked_barplot(input_list = MC_Reports, label = "absolute")
```
<p>&nbsp;</p>
#### Percentage
```{r plot-MC-stacked-barplot-percentage}
plot_MC_stacked_barplot(input_list = MC_Reports_perc, label = "percentage")
```
<p>&nbsp;</p>
# Retention Time Precision

## Preparation
The coefficient of variation (CV) can be calculated with `get_CV_RT`. Only complete profiles are used.
```{r CV-RT}
CV_RT <- get_CV_RT(input_list = files)
```
<p>&nbsp;</p>
## Plot
As a visual summary a density plot for all analyses can be accessed via `plot_CV_density`.
```{r CV-RT-plot}
plot_CV_density(input_list = CV_RT, cv_col = "RT")
```
<p>&nbsp;</p>
# Quantitative Precision

## Peptide-level
### Preparation
The CV can be calculated with `get_CV_LFQ_pep`. Only complete profiles are used.
```{r CV-Pep}
CV_LFQ_Pep <- get_CV_LFQ_pep(input_list = files)
```
<p>&nbsp;</p>
### Plot
As a visual summary a density plot for all analyses can be accessed via `plot_CV_density`.
```{r CV-Pep-plot}
plot_CV_density(input_list = CV_LFQ_Pep, cv_col = "Pep_quant")
```

## Proteingroup-level
### Preparation
The CV can be calculated with `get_CV_LFQ_pg`. Only complete profiles are used.
```{r CV-PG}
CV_LFQ_PG <- get_CV_LFQ_pg(input_list = files)
```
<p>&nbsp;</p>
### Plot
As a visual summary a density plot for all analyses can be accessed via `plot_CV_density`.
```{r CV-PG-plot}
plot_CV_density(input_list = CV_LFQ_PG, cv_col = "PG_quant")
```
<p>&nbsp;</p>
# Upset Plot
Common identifications and intersections between analyses can be highlighted.

## Preparation
Use `get_Upset_list` to prepare for Upset plotting.
```{r prepare-Upset}
Upset_prepared <- get_Upset_list(input_list = files, level = "ProteinGroup.IDs")
```

## Plot
The Upset plot can be generated with `plot_Upset`.
```{r plot-Upset}
plot_Upset(input_list = Upset_prepared, label = "ProteinGroup.IDs")
```

## Inter-software Comparison - flowTraceR
Functions of the package [flowTraceR](https://CRAN.R-project.org/package=flowTraceR) are incorporated in mpwR for inter-software comparisons. Software outputs are standardized and easily comparable.

### Precursor-level without flowTraceR
Without standardizing the precursor-level information, the software outputs only form software-dependent cluster.
```{r Upset-flowTraceR-off}
get_Upset_list(input_list = files, level = "Peptide.IDs") %>% #prepare Upset
  plot_Upset(label = "Peptide.IDs") #plot
```

### Precursor-level with flowTraceR
By enabling flowTraceR the precursor-level information is standardized and common identifications can be inferred.
```{r Upset-flowTraceR-on}
get_Upset_list(input_list = files, level = "Peptide.IDs", flowTraceR = TRUE) %>% #prepare Upset
  plot_Upset(label = "Peptide.IDs") #plot
```
<p>&nbsp;</p>
# Summary
**mpwR** offers functions to summarize the downstream analysis.

## Report
A summary report can be generated with `get_summary_Report`.
```{r summary-report, eval = FALSE}
Summary_Report <- get_summary_Report(input_list = files)
```

## Plot
As a visual summary a radar chart for all analyses can be accessed via `plot_radarchart`.

### Overview
```{r plot-radarchart, eval = FALSE}
plot_radarchart(input_df = Summary_Report)
```

### Details
To highlight individual categories, the generated summary report can be easily adjusted and used for plotting.
```{r plot-radarchart-DC, eval = FALSE}
#Focus on Data Completeness
Summary_Report %>%
  dplyr::select(Analysis, contains("Full")) %>% #Analysis column and at least one category column is required
  plot_radarchart()
```