--- title: "Reproducibility with seeker" author: "Jake Hughey" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reproducibility with seeker} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = '#>') ``` Using the `seeker` package together with docker, it's easy to make fetching and processing of sequencing and microarray data completely reproducible. First pull the latest version of the [socker](https://github.com/hugheylab/socker) image, which has `seeker` and its dependencies already installed. ```{sh, eval = FALSE} docker pull ghcr.io/hugheylab/socker ``` ## RNA-seq data The `seeker` package includes an example yaml file, R script, and shell script for fetching and processing a subset of an RNA-seq dataset. Here we'll download the files from GitHub to avoid having to install the package locally: ```{r, eval = FALSE} urlBase = 'https://raw.githubusercontent.com/hugheylab/seeker/master/inst/extdata/' for (filename in c('PRJNA600892.yml', 'run_seeker.R', 'run_seeker.sh')) { download.file(paste0(urlBase, filename), filename)} ``` PRJNA600892.yml: ```{r, code = readLines(system.file('extdata', 'PRJNA600892.yml', package = 'seeker')), eval = FALSE} ``` run_seeker.R: ```{r, code = readLines(system.file('extdata', 'run_seeker.R', package = 'seeker')), eval = FALSE} ``` run_seeker.sh: ```{r, code = readLines(system.file('extdata', 'run_seeker.sh', package = 'seeker')), eval = FALSE} ``` Now simply run the shell script: ```{sh, eval = FALSE} sh run_seeker.sh ``` The output will appear in your working directory. You can follow `seeker()`'s progress using the log file. To process a different dataset, modify the yaml file and shell script accordingly. Beware this example uses "salmon_partial_sa_index" from refgenie to minimize computational requirements; for actual use we recommend "salmon_sa_index". ## Microarray data The `seeker` package also includes an example yaml file, R script, and shell script for fetching and processing a microarray dataset. Download the files to your working directory: ```{r, eval = FALSE} urlBase = 'https://raw.githubusercontent.com/hugheylab/seeker/master/inst/extdata/' for (filename in c('GSE25585.yml', 'run_seeker_array.R', 'run_seeker_array.sh')) { download.file(paste0(urlBase, filename), filename)} ``` GSE25585.yml: ```{r, code = readLines(system.file('extdata', 'GSE25585.yml', package = 'seeker')), eval = FALSE} ``` run_seeker_array.R: ```{r, code = readLines(system.file('extdata', 'run_seeker_array.R', package = 'seeker')), eval = FALSE} ``` run_seeker_array.sh: ```{r, code = readLines(system.file('extdata', 'run_seeker_array.sh', package = 'seeker')), eval = FALSE} ``` Now simply run the shell script: ```{sh, eval = FALSE} sh run_seeker_array.sh ``` The output will appear in your working directory. You can follow `seekerArray()`'s progress using the log file. To process a different dataset, modify the yaml file and shell script accordingly.