To download and process GEO datasets, use the
prepare_geo()
function. This will generate a list
containing count data, sample information, and gene data.
For preparing TCGA RNA-Seq data using the
TCGAbiolinks`` package, use the
prepare_tcga()` function.
This will return a list containing count data for all samples and
unstranded FPKM data for tumor samples, along with sample and feature
information.
Three functions are designed for this workflow:
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(
project = "TCGA-CHOL",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification"
)
GDCdownload(query = query)
data <- GDCprepare(query = query)
lt <- prepare_tcga(data)
lt$all$sampleInfo[["group"]] <- fifelse(lt$all$sampleInfo$sample_type %ilike% "Tumor", "Tumor", "Normal")
# limma workflow
x <- dgeList(lt$all$exprCount, lt$all$sampleInfo, lt$all$featuresInfo)
x <- dprocess_dgeList(x, "group", 10)
efit <- limmaFit(x, "group")
CHOL.DEGs <- limma::topTable(fit = efit, coef = 1, number = Inf)
For a detailed overview of the Limma workflow, refer to the article: RNA-seq analysis is as easy as 1-2-3 with limma, Glimma and edgeR.
Next, visualize the differentially expressed genes using the
plotVolcano()
function.
To perform pathway enrichment analysis such as GO and KEGG, download
the r4msigdb
package to access the MSigDB gene set.
To know more details about this package, please see r4msigdb. To get GO pathways in MSigDB:
The core function is adapted from the fgsea
package with
minor visual enhancements.
Run the GSEA analysis:
fgseaRes <- fgsea(pathways = examplePathways,
stats = exampleRanks,
minSize = 15,
maxSize = 500)
plotGSEA(
fgseaRes,
pathways = examplePathways,
pwayname = "5991130_Programmed_Cell_Death",
stats = exampleRanks,
save = FALSE
)
#> Warning in fsort(stats, TRUE): New parallel sort has not been implemented for
#> decreasing=TRUE so far. Using one thread.