--- title: "MEGENA_pipeline" author: "Won-Min Song" date: "January 04, 2017" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MEGENA pipeline overview} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## MEGENA pipeline as of 10/25/2016 This is a routine MEGENA pipeline description encompassing from data correlation analysis to module plotting, and is based on version 1.3.6 <https://CRAN.R-project.org/package=MEGENA>.Please cite the paper below when MEGENA is applied as part of your analysis: Song W.-M., Zhang B. (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574. doi: 10.1371/journal.pcbi.1004574. For statistical mechanics aspects involved in MEGENA, please check: Song W.-M., Di Matteo T.and Aste T., Building Complex Networks with Platonic Solids, Physical Reivew E, 2012 Apr;85(4 Pt 2):046115. # calculate correlation ```{r correlation,message=FALSE} rm(list = ls()) # rm R working space library(MEGENA) # input parameters n.cores <- 2; # number of cores/threads to call for PCP doPar <-TRUE; # do we want to parallelize? method = "pearson" # method for correlation. either pearson or spearman. FDR.cutoff = 0.05 # FDR threshold to define significant correlations upon shuffling samples. module.pval = 0.05 # module significance p-value. Recommended is 0.05. hub.pval = 0.05 # connectivity significance p-value based random tetrahedral networks cor.perm = 10; # number of permutations for calculating FDRs for all correlation pairs. hub.perm = 100; # number of permutations for calculating connectivity significance p-value. # annotation to be done on the downstream annot.table=NULL id.col = 1 symbol.col= 2 ########### data(Sample_Expression) # load toy example data ijw <- calculate.correlation(datExpr,doPerm = cor.perm,output.corTable = FALSE,output.permFDR = FALSE) ``` # calculate PFN In this step, Planar Filtered Network (PFN) is calculated by taking significant correlation pairs, ijw. In the case of utilizing a different similarity measure, one can independently format the results into 3-column data frame with column names c("row","col","weight"), and make sure the weight column ranges within 0 to 1. Using this as an input to calculate.PFN() will work just as fine. ```{r PFN} #### register multiple cores if needed: note that set.parallel.backend() is deprecated. run.par = doPar & (getDoParWorkers() == 1) if (run.par) { cl <- parallel::makeCluster(n.cores) registerDoParallel(cl) # check how many workers are there cat(paste("number of cores to use:",getDoParWorkers(),"\n",sep = "")) } ##### calculate PFN el <- calculate.PFN(ijw[,1:3],doPar = doPar,num.cores = n.cores,keep.track = FALSE) g <- graph.data.frame(el,directed = FALSE) ``` # perform clustering MCA clustering is performed to identify multiscale clustering analysis. "MEGENA.output"" is the core output to be used in the down-stream analyses for summarization and plotting. ```{r MCA,results="hide",warning=FALSE} ##### perform MCA clustering. MEGENA.output <- do.MEGENA(g, mod.pval = module.pval,hub.pval = hub.pval,remove.unsig = TRUE, min.size = 10,max.size = vcount(g)/2, doPar = doPar,num.cores = n.cores,n.perm = hub.perm, save.output = FALSE) ###### unregister cores as these are not needed anymore. if (getDoParWorkers() > 1) { env <- foreach:::.foreachGlobals rm(list=ls(name=env), pos=env) } ``` # summarize results ```{r summarize} summary.output <- MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = module.pval,hub.pvalue = hub.pval, min.size = 10,max.size = vcount(g)/2, annot.table = annot.table,id.col = id.col,symbol.col = symbol.col, output.sig = TRUE) if (!is.null(annot.table)) { # update annotation to map to gene symbols V(g)$name <- paste(annot.table[[symbol.col]][match(V(g)$name,annot.table[[id.col]])],V(g)$name,sep = "|") summary.output <- output[c("mapped.modules","module.table")] names(summary.output)[1] <- "modules" } print(head(summary.output$modules,2)) print(summary.output$module.table) ``` # Plot some modules You can generate refined module network plots: ```{r modulePlot} pnet.obj <- plot_module(output.summary = summary.output,PFN = g,subset.module = "c1_3", layout = "kamada.kawai",label.hubs.only = TRUE, gene.set = NULL,color.code = "grey", output.plot = FALSE,out.dir = "modulePlot",col.names = c("magenta","green","cyan"),label.scaleFactor = 20, hubLabel.col = "black",hubLabel.sizeProp = 1,show.topn.hubs = Inf,show.legend = TRUE) #X11(); print(pnet.obj[[1]]) ``` # plot module hierarchy ```{r module hierarchy} module.table <- summary.output$module.table colnames(module.table)[1] <- "id" # first column of module table must be labelled as "id". hierarchy.obj <- plot_module_hierarchy(module.table = module.table,label.scaleFactor = 0.15, arrow.size = 0.03,node.label.color = "blue") #X11(); print(hierarchy.obj[[1]]) ```