--- title: "Getting started with ssw-r: Sequence alignment examples" bibliography: ssw.bib output: rmarkdown::html_document: toc: true number_sections: false highlight: "textmate" css: "custom.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Getting started with ssw-r: Sequence alignment examples} --- ```{r, include=FALSE} knitr::opts_chunk$set( comment = "NA", collapse = TRUE ) run <- if (ssw::is_installed_ssw_py()) TRUE else FALSE knitr::opts_chunk$set(eval = run) ``` The R package ssw offers an R interface for [SSW](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library), a fast implementation of the Smith-Waterman algorithm for sequence alignment using SIMD, described in @zhao2013ssw. The package is currently built on [ssw-py](https://pypi.org/project/ssw-py/). ```{r} library("ssw") ``` A short read sequence: ```{r} read <- "ACGT" ``` ## Exact alignment Align the read against the reference sequence (`TTTTACGTCCCCC`) and print the results: ```{r} read |> align("TTTTACGTCCCCC") ``` ```{text, echo = !run} CIGAR start index 4: 4M optimal_score: 8 sub-optimal_score: 0 target_begin: 4 target_end: 7 query_begin: 0 query_end: 3 Target: 4 ACGT 7 |||| Query: 0 ACGT 3 ``` Get specific results, such as the alignment scores: ```{r} a <- read |> align("TTTTACGTCCCCC") ``` ```{r} a$alignment$optimal_score ``` ```{text, echo = !run} [1] 8 ``` ```{r} a$alignment$sub_optimal_score ``` ```{text, echo = !run} [1] 0 ``` ## Deletion ```{r} read |> align("TTTTACAGTCCCCC") ``` ```{text, echo = !run} CIGAR start index 4: 2M1D2M optimal_score: 5 sub-optimal_score: 0 target_begin: 4 target_end: 8 query_begin: 0 query_end: 3 Target: 4 ACAGT 8 ||*|| Query: 0 AC-GT 3 ``` ## Insertion with gap open ```{r} read |> align("TTTTACTCCCCC", gap_open = 3) ``` ```{text, echo = !run} CIGAR start index 4: 2M optimal_score: 4 sub-optimal_score: 0 target_begin: 4 target_end: 5 query_begin: 0 query_end: 1 Target: 4 AC 5 || Query: 0 AC 1 ``` ## Insertion with no gap open penalty ```{r} read |> align("TTTTACTCCCCC", gap_open = 0) ``` ```{text, echo = !run} CIGAR start index 4: 2M1I1M optimal_score: 6 sub-optimal_score: 0 target_begin: 4 target_end: 6 query_begin: 0 query_end: 3 Target: 4 AC-T 6 ||*| Query: 0 ACGT 3 ``` ## Specify start index ```{r} a <- align("ACTG", "ACTCACTG", start_idx = 4) a ``` ```{text, echo = !run} CIGAR start index 0: 4M optimal_score: 8 sub-optimal_score: 0 target_begin: 0 target_end: 3 query_begin: 0 query_end: 3 Target: 0 ACTC 3 |||* Query: 0 ACTG 3 ``` Print the results from position 4: ```{r} a |> print(start_idx = 4) ``` ```{text, echo = !run} CIGAR start index 0: 4M optimal_score: 8 sub-optimal_score: 0 target_begin: 0 target_end: 3 query_begin: 0 query_end: 3 Target: 0 ACTG 3 |||| Query: 0 ACTG 3 ``` ## Forced alignment Enforce no gaps by increasing the penalty: ```{r} a <- force_align("ACTG", "TTTTCTGCCCCCACG") a ``` The results are truncated: ```{text, echo = !run} CIGAR start index 4: 3M optimal_score: 6 sub-optimal_score: 0 target_begin: 4 target_end: 6 query_begin: 1 query_end: 3 Target: 4 CTG 6 ||| Query: 1 CTG 3 ``` Use `formatter()` to avoid the truncation: ```{r} b <- a |> formatter() b ``` ```{text, echo = !run} [[1]] [1] "TTTTCTGCCCCCACG" [[2]] [1] " ACTG" ``` Or pretty-print the formatted results directly: ```{r} a |> formatter(print = TRUE) ``` ```{text, echo = !run} TTTTCTGCCCCCACG ACTG ``` ## Acknowledgements ssw-r is built upon the work of two outstanding projects: 1. [SSW](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library) - Original C implementation. Author: Mengyao Zhao 1. [ssw-py](https://pypi.org/project/ssw-py/) - Python binding for SSW. Author: Nick Conway We extend our sincere gratitude to Mengyao Zhao for developing the original SSW library and to Nick Conway for maintaining the ssw-py package. Their work forms the foundation of ssw-r. While ssw-r does not directly incorporate code from these projects, it serves as an R interface to their functionality. We encourage users to visit the original repositories for more information about the underlying implementation and to consider citing these works in publications that use ssw-r. ## References