--- title: "Exporting objects and functions from the workspace" author: "Phil Chalmers" date: "`r Sys.Date()`" output: html_document: fig_caption: false number_sections: true toc: true toc_float: collapsed: false smooth_scroll: false vignette: > %\VignetteIndexEntry{Exporting objects and functions from the workspace} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r nomessages, echo = FALSE} knitr::opts_chunk$set( warning = FALSE, message = FALSE, fig.height = 5, fig.width = 5 ) options(digits=4) par(mar=c(3,3,1,1)+.1) ``` # Including fixed objects R is fun language for computer programming and statistics, but it's not without it's quirks. For instance, R generally has a recursive strategy when attempting to find objects within functions. If an object can't be found, R will start to look outside the function's environment to see if the object can be located there, and if not, look within even higher-level environments... This recursive search continues until it searches for the object in the user workspace/Global environment, and only when the object can't be found here will an error be thrown. This is a strange feature to most programmers who come from other languages, and when writing simulations may cause some severely unwanted issues. This tutorial demonstrates how to make sure all required user-defined objects are visible to `SimDesign`. # Scoping ```{r include=FALSE} set.seed(1234) ``` To demonstrate the issue, let's define two objects and a function which uses these objects. ```{r} obj1 <- 10 obj2 <- 20 ``` When evaluated, these objects are visible to the user, and can be seen by typing in the R console by typing `ls()`. Functions which do not define objects with the same name will also be able to locate these values. ```{r} myfun <- function(x) obj1 + obj2 myfun(1) ``` This behavior is indeed a bit strange, but it's one of R's quirks. Unfortunately, when running code in parallel across different cores these objects *will not be visible*, and therefore must be exported using other methods (e.g., in the `parallel` package this is done with `clusterExport()`). ```{r eval = FALSE} library(parallel) cl <- makeCluster(2) res <- try(parSapply(cl=cl, 1:4, myfun)) res ``` ```{r echo=FALSE} library(parallel) cl <- makeCluster(2) cat("Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: object 'obj1' not found") ``` Exporting the objects to the cluster fixes the issue. ```{r} clusterExport(cl=cl, c('obj1', 'obj2')) parSapply(cl=cl, 1:4, myfun) ``` The same reasoning above applies to functions defined in the R workspace as well, including functions defined within external R packages. Hence, in order to use functions from other packages they must either be explicitly loaded with `require()` or `library()` within the distributed code, or referenced via their Namespace with the `::` operator (e.g., `mvtnorm::rmvtnorm()`). ```{r echo=FALSE} stopCluster(cl) ``` # Exporting objects example In order to make objects safely visible in `SimDesign` the strategy is very simple: wrap all desired objects into a named list (or other object), and pass this to the `fixed_objects` argument. From here, elements can be indexed using the `$` operator or `with()` function, or whatever other method may be convenient. Note, however, this is only required for defined *objects* not *functions* --- `SimDesign` automatically makes user-defined functions available across all nodes. As an aside, an alternative approach is simply to define/source the objects within the respective `SimDesign` functions; that way they will clearly be visible at runtime. The following `fixed_objects` approach is really only useful when the defined objects contain a large amount of code. ```{r} library(SimDesign) #SimFunctions(comments = FALSE) ### Define design conditions and number of replications Design <- createDesign(N = c(10, 20, 30)) replications <- 1000 # define custom functions and objects (or use source() to read these in from an external file) SD <- 2 my_gen_fun <- function(n, sd) rnorm(n, sd = sd) my_analyse_fun <- function(x) c(p = t.test(x)$p.value) fixed_objects <- list(SD=SD) #--------------------------------------------------------------------------- Generate <- function(condition, fixed_objects) { Attach(condition) # make condition names available (e.g., N) # further, can use with() to use 'SD' directly instead of 'fixed_objects$SD' ret <- with(fixed_objects, my_gen_fun(N, sd=SD)) ret } Analyse <- function(condition, dat, fixed_objects) { ret <- my_analyse_fun(dat) ret } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha = .05) ret } #--------------------------------------------------------------------------- ### Run the simulation res <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects, generate=Generate, analyse=Analyse, summarise=Summarise, debug='none') res ``` By placing objects in a list and passing this to `fixed_objects`, the objects are safely exported to all relevant functions. Furthermore, running this code in parallel will also be valid as a consequence (see below) because all objects are properly exported to each core. ```{r eval=FALSE} res <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects, generate=Generate, analyse=Analyse, summarise=Summarise, debug='none', parallel = TRUE) ``` Again, remember that this is only required for R **objects**, NOT for user-defined functions!