rslurm/0000755000176200001440000000000013563553472011613 5ustar liggesusersrslurm/NAMESPACE0000644000176200001440000000050113563333032013012 0ustar liggesusers# Generated by roxygen2: do not edit by hand export(cancel_slurm) export(cleanup_files) export(get_job_status) export(get_slurm_out) export(print_job_status) export(slurm_apply) export(slurm_call) export(slurm_job) importFrom(parallel,mclapply) importFrom(utils,capture.output) importFrom(utils,read.table) rslurm/README.md0000644000176200001440000000612213563515714013070 0ustar liggesusers # rslurm: submit R code to a Slurm cluster [![cran checks](https://cranchecks.info/badges/worst/rslurm)](https://CRAN.R-project.org/web/checks/check_results_rslurm.html) [![rstudio mirror downloads](https://cranlogs.r-pkg.org/badges/rslurm)](https://CRAN.R-project.org/package=rslurm) [![Build Status](https://travis-ci.org/SESYNC-ci/rslurm.svg?branch=master)](https://travis-ci.org/SESYNC-ci/rslurm) [![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) [![CRAN status](https://www.r-pkg.org/badges/version/rslurm)](https://CRAN.R-project.org/package=rslurm) ### About Development of this R package was supported by the National Socio-Environmental Synthesis Center (SESYNC) under funding received from the National Science Foundation grants DBI-1052875 and DBI-1639145. The package was developed by Philippe Marchand and Ian Carroll, with Mike Smorul and Rachael Blake contributing. Quentin Read is the current maintainer. ### Installation You can install the released version of rslurm from [CRAN](https://CRAN.R-project.org) with: ``` r install.packages("rslurm") ``` And the development version from [GitHub](https://github.com/SESYNC-ci/rslurm) with: ``` r # install.packages("devtools") devtools::install_github("SESYNC-ci/rslurm") ``` ### Documentation Package documentation is accessible from the R console through `package?rslurm` and [online](https://cran.r-project.org/package=rslurm). ### Example Note that job submission is only possible on a system with access to a Slurm workload manager (i.e. a system where the command line utilities `squeue` or `sinfo` return information from a Slurm head node). To illustrate a typical rslurm workflow, we use a simple function that takes a mean and standard deviation as parameters, generates a million normal deviates and returns the sample mean and standard deviation. ``` r test_func <- function(par_mu, par_sd) { samp <- rnorm(10^6, par_mu, par_sd) c(s_mu = mean(samp), s_sd = sd(samp)) } ``` We then create a parameter data frame where each row is a parameter set and each column matches an argument of the function. ``` r pars <- data.frame(par_mu = 1:10, par_sd = seq(0.1, 1, length.out = 10)) ``` We can now pass that function and the parameters data frame to `slurm_apply`, specifying the number of cluster nodes to use and the number of CPUs per node. ``` r library(rslurm) sjob <- slurm_apply(test_func, pars, jobname = 'test_apply', nodes = 2, cpus_per_node = 2, submit = FALSE) ``` The output of `slurm_apply` is a `slurm_job` object that stores a few pieces of information (job name, job ID, and the number of nodes) needed to retrieve the job’s output. See [Get started](http://cyberhelp.sesync.org/rslurm/) for more information. rslurm/man/0000755000176200001440000000000013562577243012367 5ustar liggesusersrslurm/man/get_slurm_out.Rd0000644000176200001440000000334413556566473015560 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/get_slurm_out.R \name{get_slurm_out} \alias{get_slurm_out} \title{Reads the output of a function calculated on the Slurm cluster} \usage{ get_slurm_out(slr_job, outtype = "raw", wait = TRUE, ncores = NULL) } \arguments{ \item{slr_job}{A \code{slurm_job} object.} \item{outtype}{Can be "table" or "raw", see "Value" below for details.} \item{wait}{Specify whether to block until \code{slr_job} completes.} \item{ncores}{(optional) If not null, the number of cores passed to mclapply} } \value{ If \code{outtype = "table"}: A data frame with one column by return value of the function passed to \code{slurm_apply}, where each row is the output of the corresponding row in the params data frame passed to \code{slurm_apply}. If \code{outtype = "raw"}: A list where each element is the output of the function passed to \code{slurm_apply} for the corresponding row in the params data frame passed to \code{slurm_apply}. } \description{ This function reads all function output files (one by cluster node used) from the specified Slurm job and returns the result in a single data frame (if "table" format selected) or list (if "raw" format selected). It doesn't record any messages (including warnings or errors) output to the R console during the computation; these can be consulted by invoking \code{\link{print_job_status}}. } \details{ The \code{outtype} option is only relevant for jobs submitted with \code{slurm_apply}. Jobs sent with \code{slurm_call} only return a single object, and setting \code{outtype = "table"} creates an error in that case. } \seealso{ \code{\link{slurm_apply}}, \code{\link{slurm_call}} } rslurm/man/cleanup_files.Rd0000644000176200001440000000176613556566473015507 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cancel_cleanup.R \name{cleanup_files} \alias{cleanup_files} \title{Deletes temporary files associated with a Slurm job} \usage{ cleanup_files(slr_job, wait = TRUE) } \arguments{ \item{slr_job}{A \code{slurm_job} object.} \item{wait}{Specify whether to block until \code{slr_job} completes.} } \description{ This function deletes all temporary files associated with the specified Slurm job, including files created by \code{\link{slurm_apply}} or \code{\link{slurm_call}}, as well as outputs from the cluster. These files should be located in the \emph{_rslurm_[jobname]} folder of the current working directory. } \examples{ \dontrun{ sjob <- slurm_apply(func, pars) print_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) } } \seealso{ \code{\link{slurm_apply}}, \code{\link{slurm_call}} } rslurm/man/cancel_slurm.Rd0000644000176200001440000000115213556566473015332 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cancel_cleanup.R \name{cancel_slurm} \alias{cancel_slurm} \title{Cancels a scheduled Slurm job} \usage{ cancel_slurm(slr_job) } \arguments{ \item{slr_job}{A \code{slurm_job} object.} } \description{ This function cancels the specified Slurm job by invoking the Slurm \code{scancel} command. It does \emph{not} delete the temporary files (e.g. scripts) created by \code{\link{slurm_apply}} or \code{\link{slurm_call}}. Use \code{\link{cleanup_files}} to remove those files. } \seealso{ \code{\link{cleanup_files}} } rslurm/man/print_job_status-deprecated.Rd0000644000176200001440000000204413556566473020353 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/print_job_status.R \name{print_job_status-deprecated} \alias{print_job_status-deprecated} \title{Prints the status of a Slurm job and, if completed, its console/error output} \usage{ print_job_status(slr_job) } \arguments{ \item{slr_job}{A \code{slurm_job} object.} } \description{ Prints the status of a Slurm job and, if completed, its console/error output. } \details{ If the specified Slurm job is still in the queue or running, this function prints its current status (as output by the Slurm \code{squeue} command). The output displays one row by node currently running part of the job ("R" in the "ST" column) and how long it has been running ("TIME"). One row indicates the portions of the job still in queue ("PD" in the "ST" column), if any. If all portions of the job have completed or stopped, the function prints the console and error output, if any, generated by each node. } \seealso{ \code{\link{rslurm-deprecated}} } \keyword{internal} rslurm/man/slurm_job.Rd0000644000176200001440000000173013562577243014653 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/slurm_job.R \name{slurm_job} \alias{slurm_job} \title{Create a slurm_job object} \usage{ slurm_job(jobname, nodes) } \arguments{ \item{jobname}{The name of the Slurm job. The rslurm-generated scripts and output files associated with a job should be found in the \emph{_rslurm_[jobname]} folder.} \item{nodes}{The number of cluster nodes used by that job.} } \value{ A \code{slurm_job} object. } \description{ This function creates a \code{slurm_job} object which can be passed to other functions such as \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, \code{\link{get_slurm_out}} and \code{\link{get_job_status}}. } \details{ In general, \code{slurm_job} objects are created automatically as the output of \code{\link{slurm_apply}} or \code{\link{slurm_call}}, but it may be necessary to manually recreate one if the job was submitted in a different R session. } rslurm/man/slurm_call.Rd0000644000176200001440000001017013562577243015012 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/slurm_call.R \name{slurm_call} \alias{slurm_call} \title{Execution of a single function call on the Slurm cluster} \usage{ slurm_call(f, params, jobname = NA, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE) } \arguments{ \item{f}{Any R function.} \item{params}{A named list of parameters to pass to \code{f}.} \item{jobname}{The name of the Slurm job; if \code{NA}, it is assigned a random name of the form "slr####".} \item{add_objects}{A character vector containing the name of R objects to be saved in a .RData file and loaded on each cluster node prior to calling \code{f}.} \item{pkgs}{A character vector containing the names of packages that must be loaded on each cluster node. By default, it includes all packages loaded by the user when \code{slurm_call} is called.} \item{libPaths}{A character vector describing the location of additional R library trees to search through, or NULL. The default value of NULL corresponds to libraries returned by \code{.libPaths()} on a cluster node. Non-existent library trees are silently ignored.} \item{rscript_path}{The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run.} \item{r_template}{The path to the template file for the R script run on each node. If NULL, uses the default template "rslurm/templates/slurm_run_single_R.txt".} \item{sh_template}{The path to the template file for the sbatch submission script. If NULL, uses the default template "rslurm/templates/submit_single_sh.txt".} \item{slurm_options}{A named list of options recognized by \code{sbatch}; see Details below for more information.} \item{submit}{Whether or not to submit the job to the cluster with \code{sbatch}; see Details below for more information.} } \value{ A \code{slurm_job} object containing the \code{jobname} and the number of \code{nodes} effectively used. } \description{ Use \code{slurm_call} to perform a single function evaluation a the Slurm cluster. } \details{ This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job. The names of any other R objects (besides \code{params}) that \code{f} needs to access should be listed in the \code{add_objects} argument. Use \code{slurm_options} to set any option recognized by \code{sbatch}, e.g. \code{slurm_options = list(time = "1:00:00", share = TRUE)}. See \url{http://slurm.schedmd.com/sbatch.html} for details on possible options. Note that full names must be used (e.g. "time" rather than "t") and that flags (such as "share") must be specified as TRUE. The "job-name", "ntasks" and "output" options are already determined by \code{slurm_call} and should not be manually set. When processing the computation job, the Slurm cluster will output two files in the temporary folder: one with the return value of the function ("results_0.RDS") and one containing any console or error output produced by R ("slurm_[node_id].out"). If \code{submit = TRUE}, the job is sent to the cluster and a confirmation message (or error) is output to the console. If \code{submit = FALSE}, a message indicates the location of the saved data and script files; the job can be submitted manually by running the shell command \code{sbatch submit.sh} from that directory. After sending the job to the Slurm cluster, \code{slurm_call} returns a \code{slurm_job} object which can be used to cancel the job, get the job status or output, and delete the temporary files associated with it. See the description of the related functions for more details. } \seealso{ \code{\link{slurm_apply}} to parallelize a function over a parameter set. \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, \code{\link{get_slurm_out}} and \code{\link{get_job_status}} which use the output of this function. } rslurm/man/get_job_status.Rd0000644000176200001440000000175613556566473015711 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/get_job_status.R \name{get_job_status} \alias{get_job_status} \title{Get the status of a Slurm job} \usage{ get_job_status(slr_job) } \arguments{ \item{slr_job}{A \code{slurm_job} object.} } \value{ A list with three elements: \code{completed} is a logical value indicating if all portions of the job have completed or stopped, \code{queue} contains the information on job elements still in queue, and \code{log} contains the console/error logs. } \description{ This function returns the completion status of a Slurm job, its queue status if any and log outputs. } \details{ The \code{queue} element of the output is a data frame matching the output of the Slurm \code{squeue} command for that job; it will only indicate portions of job that are running or in queue. The \code{log} element is a vector of the contents of console/error output files for each node where the job is running. } rslurm/man/slurm_apply.Rd0000644000176200001440000001343413562577243015232 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/slurm_apply.R \name{slurm_apply} \alias{slurm_apply} \title{Parallel execution of a function on the Slurm cluster} \usage{ slurm_apply(f, params, jobname = NA, nodes = 2, cpus_per_node = 2, preschedule_cores = TRUE, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE) } \arguments{ \item{f}{A function that accepts one or many single values as parameters and may return any type of R object.} \item{params}{A data frame of parameter values to apply \code{f} to. Each column corresponds to a parameter of \code{f} (\emph{Note}: names must match) and each row corresponds to a separate function call.} \item{jobname}{The name of the Slurm job; if \code{NA}, it is assigned a random name of the form "slr####".} \item{nodes}{The (maximum) number of cluster nodes to spread the calculation over. \code{slurm_apply} automatically divides \code{params} in chunks of approximately equal size to send to each node. Less nodes are allocated if the parameter set is too small to use all CPUs on the requested nodes.} \item{cpus_per_node}{The number of CPUs requested per node, i.e., how many processes to run in parallel per node. This argument is mapped to the Slurm parameter \code{cpus-per-task}.} \item{preschedule_cores}{Corresponds to the \code{mc.preschedule} argument of \code{parallel::mcmapply}. Defaults to \code{TRUE}. If \code{TRUE}, the jobs are assigned to cores before computation. If \code{FALSE}, a new job is created for each row of \code{params}. Setting \code{FALSE} may be faster if different values of \code{params} result in very variable completion time for jobs.} \item{add_objects}{A character vector containing the name of R objects to be saved in a .RData file and loaded on each cluster node prior to calling \code{f}.} \item{pkgs}{A character vector containing the names of packages that must be loaded on each cluster node. By default, it includes all packages loaded by the user when \code{slurm_apply} is called.} \item{libPaths}{A character vector describing the location of additional R library trees to search through, or NULL. The default value of NULL corresponds to libraries returned by \code{.libPaths()} on a cluster node. Non-existent library trees are silently ignored.} \item{rscript_path}{The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run.} \item{r_template}{The path to the template file for the R script run on each node. If NULL, uses the default template "rslurm/templates/slurm_run_R.txt".} \item{sh_template}{The path to the template file for the sbatch submission script. If NULL, uses the default template "rslurm/templates/submit_sh.txt".} \item{slurm_options}{A named list of options recognized by \code{sbatch}; see Details below for more information.} \item{submit}{Whether or not to submit the job to the cluster with \code{sbatch}; see Details below for more information.} } \value{ A \code{slurm_job} object containing the \code{jobname} and the number of \code{nodes} effectively used. } \description{ Use \code{slurm_apply} to compute function over multiple sets of parameters in parallel, spread across multiple nodes of a Slurm cluster. } \details{ This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job. The set of input parameters is divided in equal chunks sent to each node, and \code{f} is evaluated in parallel within each node using functions from the \code{parallel} R package. The names of any other R objects (besides \code{params}) that \code{f} needs to access should be included in \code{add_objects}. Use \code{slurm_options} to set any option recognized by \code{sbatch}, e.g. \code{slurm_options = list(time = "1:00:00", share = TRUE)}. See \url{http://slurm.schedmd.com/sbatch.html} for details on possible options. Note that full names must be used (e.g. "time" rather than "t") and that flags (such as "share") must be specified as TRUE. The "array", "job-name", "nodes", "cpus-per-task" and "output" options are already determined by \code{slurm_apply} and should not be manually set. When processing the computation job, the Slurm cluster will output two types of files in the temporary folder: those containing the return values of the function for each subset of parameters ("results_[node_id].RDS") and those containing any console or error output produced by R on each node ("slurm_[node_id].out"). If \code{submit = TRUE}, the job is sent to the cluster and a confirmation message (or error) is output to the console. If \code{submit = FALSE}, a message indicates the location of the saved data and script files; the job can be submitted manually by running the shell command \code{sbatch submit.sh} from that directory. After sending the job to the Slurm cluster, \code{slurm_apply} returns a \code{slurm_job} object which can be used to cancel the job, get the job status or output, and delete the temporary files associated with it. See the description of the related functions for more details. } \examples{ \dontrun{ sjob <- slurm_apply(func, pars) get_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) } } \seealso{ \code{\link{slurm_call}} to evaluate a single function call. \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, \code{\link{get_slurm_out}} and \code{\link{get_job_status}} which use the output of this function. } rslurm/man/rslurm-package.Rd0000644000176200001440000000746513563271424015600 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/rslurm-package.R \docType{package} \name{rslurm-package} \alias{rslurm-package} \title{Introduction to the \code{rslurm} Package} \description{ Send long-running or parallel jobs to a Slurm workload manager (i.e. cluster) using the \code{\link{slurm_call}} or \code{\link{slurm_apply}} functions. } \section{Job submission}{ This package includes two core functions used to send computations to a Slurm cluster: 1) \code{\link{slurm_call}} executes a function using a single set of parameters (passed as a list), and 2) \code{\link{slurm_apply}} evaluates a function in parallel for each row of parameters in a given data frame. The second, \code{slurm_apply}, automatically splits the parameter rows into equal-size chunks, each chunk to be processed by a separate cluster node. It uses functions from the \code{\link[parallel]{parallel-package}} package to parallelize computations across processors on a given node. The output of \code{slurm_apply} or \code{slurm_call} is a \code{slurm_job} object that serves as an input to the other functions in the package: \code{\link{print_job_status}}, \code{\link{cancel_slurm}}, \code{\link{get_slurm_out}} and \code{\link{cleanup_files}}. } \section{Function specification}{ To be compatible with \code{\link{slurm_apply}}, a function may accept any number of single value parameters. The names of these parameters must match the column names of the \code{params} data frame supplied. There are no restrictions on the types of parameters passed as a list to \code{\link{slurm_call}}. If the function passed to \code{slurm_call} or \code{slurm_apply} requires knowledge of any R objects (data, custom helper functions) besides \code{params}, a character vector corresponding to their names should be passed to the optional \code{add_objects} argument. When parallelizing a function, since any error will interrupt all calculations for the current node, it may be useful to wrap expressions which may generate errors into a \code{\link[base]{try}} or \code{\link[base:conditions]{tryCatch}} function. This will ensure the computation continues with the next parameter set after reporting the error. } \section{Output Format}{ The default output format for \code{get_slurm_out} (\code{outtype = "raw"}) is a list where each element is the return value of one function call. If the function passed to \code{slurm_apply} produces a vector output, you may use \code{outtype = "table"} to collect the output in a single data frame, with one row by function call. } \section{Slurm Configuration}{ Advanced options for the Slurm workload manager may accompany job submission by both \code{\link{slurm_call}} and \code{\link{slurm_apply}} through the optional \code{slurm_options} argument. For example, passing \code{list(time = '1:30')} for this options limits the job to 1 hour and 30 minutes. Some advanced configuration must be set through environment variables. On a multi-cluster head node, for example, the \code{SLURM_CLUSTERS} environment variable must be set to direct jobs to a non-default cluster. } \examples{ \dontrun{ # Create a data frame of mean/sd values for normal distributions pars <- data.frame(par_m = seq(-10, 10, length.out = 1000), par_sd = seq(0.1, 10, length.out = 1000)) # Create a function to parallelize ftest <- function(par_m, par_sd) { samp <- rnorm(10^7, par_m, par_sd) c(s_m = mean(samp), s_sd = sd(samp)) } sjob1 <- slurm_apply(ftest, pars) print_job_status(sjob1) res <- get_slurm_out(sjob1, "table") all.equal(pars, res) # Confirm correct output cleanup_files(sjob1) } } rslurm/man/rslurm-deprecated.Rd0000644000176200001440000000127013556566473016306 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/print_job_status.R, R/rslurm-deprecated.R \name{print_job_status} \alias{print_job_status} \alias{rslurm-deprecated} \title{Deprecated functions in package \pkg{rslurm}.} \usage{ print_job_status(slr_job) } \description{ The functions listed below are deprecated and will be defunct in the near future. When possible, alternative functions with similar functionality are also mentioned. Help pages for deprecated functions are available at \code{help("-deprecated")}. } \section{\code{print_job_status}}{ For \code{print_job_status}, use \code{\link{get_job_status}}. } \keyword{internal} rslurm/DESCRIPTION0000644000176200001440000000263413563553472013326 0ustar liggesusersPackage: rslurm Type: Package Title: Submit R Calculations to a 'Slurm' Cluster Description: Functions that simplify submitting R scripts to a 'Slurm' workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes. Acknowledgements: Development of this R package was supported by the National Socio-Environmental Synthesis Center (SESYNC) under funding received from the National Science Foundation grants DBI-1052875 and DBI-1639145. Version: 0.5.0 License: GPL-3 URL: https://github.com/SESYNC-ci/rslurm BugReports: https://github.com/SESYNC-ci/rslurm/issues Authors@R: c(person('Philippe', 'Marchand', email = "marchand.philippe@gmail.com", role = 'aut'), person('Ian', 'Carroll', role = 'aut'), person('Mike', 'Smorul', role = 'ctb'), person('Rachael', 'Blake', role = 'ctb'), person('Quentin', 'Read', email = 'qread@sesync.org', role = c('ctb', 'cre')) ) Depends: R (>= 3.5.0) Imports: whisker (>= 0.3) RoxygenNote: 6.1.1 Suggests: parallel, testthat, knitr, rmarkdown VignetteBuilder: knitr NeedsCompilation: no Packaged: 2019-11-15 12:41:22 UTC; qread Author: Philippe Marchand [aut], Ian Carroll [aut], Mike Smorul [ctb], Rachael Blake [ctb], Quentin Read [ctb, cre] Maintainer: Quentin Read Repository: CRAN Date/Publication: 2019-11-15 16:50:02 UTC rslurm/build/0000755000176200001440000000000013563516362012707 5ustar liggesusersrslurm/build/vignette.rds0000644000176200001440000000033013563516362015242 0ustar liggesusersmOK0-Q0&&^qCЅH҂i!DW\*N{ǐ꣧ :,C<` Q& kMB(o2b ( Ν)!y>~g-< Ti?ނ,vpj5b&;%C| ɐ:.5; ma '{[]rslurm/tests/0000755000176200001440000000000013556566473012764 5ustar liggesusersrslurm/tests/testthat/0000755000176200001440000000000013563553472014615 5ustar liggesusersrslurm/tests/testthat/test-slurm_apply.R0000644000176200001440000000574313556566473020304 0ustar liggesuserslibrary(rslurm) context("slurm_apply") SLURM = system('sinfo', ignore.stdout = TRUE, ignore.stderr = TRUE) SLURM_MSG = 'Only test on Slurm head node.' SLURM_OPTS = list(time = '1') Sys.setenv(R_TESTS = "") set.seed(123) # Create a data frame of mean/sd values for normal distributions pars <- data.frame(par_m = 1:10, par_sd = seq(0.1, 1, length.out = 10)) # Create a function to parallelize ftest <- function(par_m, par_sd = 1, ...) { samp <- rnorm(10^6, par_m, par_sd) c(s_m = mean(samp), s_sd = sd(samp)) } # ## FIXME # saveRDS(Sys.getenv(), 'testthat_env.RDS') # slurm_apply(function (i) Sys.getenv(), data.frame(i = c(0)), pkgs = c(), jobname = 'test0', nodes = 1, cpus_per_node = 1) test_that("slurm_apply gives correct output", { if (SLURM) skip(SLURM_MSG) sjob <- slurm_apply(ftest, pars, jobname = "test1", nodes = 2, cpus_per_node = 1, slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob, "table") res_raw <- get_slurm_out(sjob, "raw") cleanup_files(sjob) expect_equal(pars, res, tolerance = 0.01, check.attributes = FALSE) expect_equal(pars, as.data.frame(do.call(rbind, res_raw)), tolerance = 0.01, check.attributes = FALSE) }) test_that("slurm_apply works with single parameter", { if (SLURM) skip(SLURM_MSG) sjob <- slurm_apply(ftest, pars[, 1, drop = FALSE], jobname = "test2", nodes = 2, cpus_per_node = 1, slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob, "table") cleanup_files(sjob) expect_equal(pars$par_m, res$s_m, tolerance = 0.01) }) test_that("slurm_apply works with single row", { if (SLURM) skip(SLURM_MSG) sjob <- slurm_apply(ftest, pars[1, ], nodes = 2, jobname = "test3", cpus_per_node = 1, slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob, "table") cleanup_files(sjob) expect_equal(sjob$nodes, 1) expect_equal(pars[1, ], res, tolerance = 0.01, check.attributes = FALSE) }) test_that("slurm_apply works with single parameter and single row", { if (SLURM) skip(SLURM_MSG) sjob <- slurm_apply(ftest, pars[1, 1, drop = FALSE], jobname = "test4", nodes = 2, cpus_per_node = 1, slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob, "table") cleanup_files(sjob) expect_equal(pars$par_m[1], res$s_m, tolerance = 0.01) }) test_that("slurm_apply correctly handles add_objects", { if (SLURM) skip(SLURM_MSG) sjob <- slurm_apply(function(i) ftest(pars[i, 1], pars[i, 2]), data.frame(i = 1:nrow(pars)), add_objects = c('ftest', 'pars'), jobname = "test5", nodes = 2, cpus_per_node = 1, slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob, "table") cleanup_files(sjob) expect_equal(pars, res, tolerance = 0.01, check.attributes = FALSE) }) rslurm/tests/testthat/test-local_slurm_apply.R0000644000176200001440000000574213556566473021455 0ustar liggesuserslibrary(rslurm) context("local slurm_apply") Sys.setenv(R_TESTS = "") set.seed(123) # Create a data frame of mean/sd values for normal distributions pars <- data.frame(par_m = 1:10, par_sd = seq(0.1, 1, length.out = 10)) # Create a function to parallelize ftest <- function(par_m, par_sd = 1, ...) { samp <- rnorm(10^6, par_m, par_sd) c(s_m = mean(samp), s_sd = sd(samp)) } # Test slurm_apply locally msg <- capture.output( sjob1 <- slurm_apply(ftest, pars, jobname = "test1", nodes = 2, cpus_per_node = 1, submit = FALSE) ) sjob1 <- local_slurm_array(sjob1) res <- get_slurm_out(sjob1, "table", wait = FALSE) res_raw <- get_slurm_out(sjob1, "raw", wait = FALSE) test_that("slurm_apply gives correct output", { expect_equal(pars, res, tolerance = 0.01, check.attributes = FALSE) expect_equal(pars, as.data.frame(do.call(rbind, res_raw)), tolerance = 0.01, check.attributes = FALSE) }) # Test for degenerate cases (single parameter and/or single row) msg <- capture.output( sjob2 <- slurm_apply(ftest, pars[, 1, drop = FALSE], jobname = "test2", nodes = 2, cpus_per_node = 1, submit = FALSE) ) sjob2 <- local_slurm_array(sjob2) res <- get_slurm_out(sjob2, "table", wait = FALSE) test_that("slurm_apply works with single parameter", { expect_equal(pars$par_m, res$s_m, tolerance = 0.01) }) msg <- capture.output( sjob3 <- slurm_apply(ftest, pars[1, ], nodes = 2, jobname = "test3", cpus_per_node = 1, submit = FALSE) ) sjob3 <- local_slurm_array(sjob3) res <- get_slurm_out(sjob3, "table", wait = FALSE) test_that("slurm_apply works with single row", { expect_equal(sjob3$nodes, 1) expect_equal(pars[1, ], res, tolerance = 0.01, check.attributes = FALSE) }) msg <- capture.output( sjob4 <- slurm_apply(ftest, pars[1, 1, drop = FALSE], jobname = "test4", nodes = 2, cpus_per_node = 1, submit = FALSE) ) sjob4 <- local_slurm_array(sjob4) res <- get_slurm_out(sjob4, "table", wait = FALSE) test_that("slurm_apply works with single parameter and single row", { expect_equal(pars$par_m[1], res$s_m, tolerance = 0.01) }) # Test slurm_apply with add_objects msg <- capture.output( sjob5 <- slurm_apply(function(i) ftest(pars[i, 1], pars[i, 2]), data.frame(i = 1:nrow(pars)), add_objects = c('ftest', 'pars'), jobname = "test5", nodes = 2, cpus_per_node = 1, submit = FALSE) ) sjob5 <- local_slurm_array(sjob5) res <- get_slurm_out(sjob5, "table", wait = FALSE) test_that("slurm_apply correctly handles add_objects", { expect_equal(pars, res, tolerance = 0.01, check.attributes = FALSE) }) # Cleanup all temporary files at the end # Pause to make sure folders are free to be deleted Sys.sleep(1) lapply(list(sjob1, sjob2, sjob3, sjob4, sjob5), cleanup_files, wait = FALSE)rslurm/tests/testthat/test-local_slurm_call.R0000644000176200001440000000164213556566473021236 0ustar liggesuserslibrary(rslurm) context("local slurm_call") Sys.setenv(R_TESTS = "") # Test slurm_call locally msg <- capture.output({ z <- 0 sjob <- slurm_call(function(x, y) x * 2 + y + z, list(x = 5, y = 6), add_objects = c('z'), jobname = "test^\\* call", submit = FALSE) }) test_that("slurm_job name is correctly edited", { expect_equal(sjob$jobname, "test_call") }) sjob <- local_slurm_array(sjob) # olddir <- getwd() # rscript_path <- file.path(R.home("bin"), "Rscript") # setwd(paste0("_rslurm_", sjob$jobname)) # tryCatch(system(paste(rscript_path, "--vanilla slurm_run.R")), # finally = setwd(olddir)) res <- get_slurm_out(sjob, wait = FALSE) test_that("slurm_call returns correct output", { expect_equal(res, 16) }) # Pause to make sure temporary folder is free to be deleted Sys.sleep(1) cleanup_files(sjob, wait = FALSE) rslurm/tests/testthat/test-slurm_call.R0000644000176200001440000000224213556566473020061 0ustar liggesuserslibrary(rslurm) context("slurm_call") SLURM = system('sinfo', ignore.stdout = TRUE, ignore.stderr = TRUE) SLURM_MSG = 'Only test on Slurm head node.' SLURM_OPTS = list(time = '1') Sys.setenv(R_TESTS = "") test_that("slurm_job name is correctly edited and output is correct", { if (SLURM) skip(SLURM_MSG) z <- 0 sjob <- slurm_call(function(x, y) x * 2 + y + z, list(x = 5, y = 6), add_objects = c('z'), jobname = "test^\\* call", slurm_options = SLURM_OPTS) res <- get_slurm_out(sjob) cleanup_files(sjob) expect_equal(sjob$jobname, "test_call") expect_equal(res, 16) }) test_that("slurm_call will handle a bytecoded function", { # generated in response to issue #14 if (SLURM) skip(SLURM_MSG) params <- list( data = data.frame( x = seq(0, 1, by = 0.01), y = 0:100), formula = y ~ x) result_local <- do.call(lm, params) sjob <- slurm_call(lm, params, slurm_options = SLURM_OPTS) result_slurm <- get_slurm_out(sjob) cleanup_files(sjob) expect_equal(result_slurm$coeficients, result_local$coeficients) }) rslurm/tests/testthat.R0000644000176200001440000000007413556566473014750 0ustar liggesuserslibrary(testthat) library(rslurm) test_check("rslurm") rslurm/vignettes/0000755000176200001440000000000013563516362013620 5ustar liggesusersrslurm/vignettes/_rslurm_test_apply/0000755000176200001440000000000013563330662017544 5ustar liggesusersrslurm/vignettes/_rslurm_test_apply/params.RDS0000644000176200001440000000032613563516361021404 0ustar liggesusersb```f`f@&s+ 1;( s K3JS dYSb`qd @p OB `b,nl, _BߜYȚZ d d+H,-EkLI,i r$$DSY_ L @@1T{rslurm/vignettes/_rslurm_test_apply/slurm_1.out0000644000176200001440000000000013556566473021662 0ustar liggesusersrslurm/vignettes/_rslurm_test_apply/slurm_run.R0000644000176200001440000000143113563516361021716 0ustar liggesuserslibrary(base, quietly = TRUE) library(methods, quietly = TRUE) library(datasets, quietly = TRUE) library(utils, quietly = TRUE) library(grDevices, quietly = TRUE) library(graphics, quietly = TRUE) library(stats, quietly = TRUE) library(rslurm, quietly = TRUE) .rslurm_func <- readRDS('f.RDS') .rslurm_params <- readRDS('params.RDS') .rslurm_id <- as.numeric(Sys.getenv('SLURM_ARRAY_TASK_ID')) .rslurm_istart <- .rslurm_id * 5 + 1 .rslurm_iend <- min((.rslurm_id + 1) * 5, nrow(.rslurm_params)) .rslurm_result <- do.call(parallel::mcmapply, c( FUN = .rslurm_func, .rslurm_params[.rslurm_istart:.rslurm_iend, , drop = FALSE], mc.cores = 2, mc.preschedule = TRUE, SIMPLIFY = FALSE)) saveRDS(.rslurm_result, file = paste0('results_', .rslurm_id, '.RDS')) rslurm/vignettes/_rslurm_test_apply/f.RDS0000644000176200001440000000253113563516361020346 0ustar liggesusersXmSF>Y(Sʞ11/LHcK`3ߟ}myC1]4Uǰ϶Qo NݭKm̟Nmu>5m3ùlO֟>myiۭ/揖Ẇ1piZ9E 5-=m+ө۵y.-idW?mjۭ醯H @=+éVeԭ0" @kRʆέVŠjtd!Z)lj~w~4UΜެ;\9x8w.|hpâpzːtKeAZ(cg82.p1JX&9(:~P ʢ|TZ ƪ!W[aN@~WŒX ~ "0' >E$B"$\<$IƃB}źy#c&?x3П$A$+ۣ)£;?7ibSXKDbMgCx?Kg }>V@NߏG 5#zX?b]EأŐD}#b9>y o߅أDŽ"Ay.~ ok $={b~ O YG!,A焟'"AK?|H o?"{~=_ "F31{2~d~Y[_Yb/8QCMl?sy>CxWnaXvm7,܅@ivWݽZR9?QܸظثOȩZF臨W>>,*BZнnt ՊR?@j:֊Ij`z0߽ S`#`Rnco \~H*ᄪ֢抋tk6ǂ<!i88*.C%<eIbgCfr "^qT e `mMIAqc]ͼ?F%+5==x zDSl-#8 -<`}+q>#lcI_C =hZbN$SM7vc?sځOQ E/}L|p!\AfD(T~Mx)3El6}>޾&9凭"U5BҤ= ?rslurm/vignettes/_rslurm_test_apply/slurm_0.out0000644000176200001440000000000013556566473021661 0ustar liggesusersrslurm/vignettes/_rslurm_test_apply/results_1.RDS0000644000176200001440000000026013556566473022052 0ustar liggesusersb```b`fad`b2(wFCxcmm&200X8AsS !AR8H(àyy,doٿNYf(YSͻVJ* s-cZ?0ƭxAmwrslurm/vignettes/_rslurm_test_apply/results_0.RDS0000644000176200001440000000026113556566473022052 0ustar liggesusersb```b`fad`b2(߽2<~zX , uy@ Kq|n)] t``>-ԝ } 2' v_Pܿޏ,3/۽)2Cw0]cM% Ewrslurm/vignettes/_rslurm_test_apply/submit.sh0000644000176200001440000000027413563516361021410 0ustar liggesusers#!/bin/bash # #SBATCH --array=0-1 #SBATCH --cpus-per-task=2 #SBATCH --job-name=test_apply #SBATCH --output=slurm_%a.out C:/PROGRA~1/R/R-36~1.0/bin/x64/Rscript --vanilla slurm_run.R rslurm/vignettes/rslurm.Rmd0000644000176200001440000002126513556566473015630 0ustar liggesusers--- title: Parallelize R code on a Slurm cluster output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Parallelize R code on a Slurm cluster} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} --- Many computing-intensive processes in R involve the repeated evaluation of a function over many items or parameter sets. These so-called [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) calculations can be run serially with the `lapply` or `Map` function, or in parallel on a single machine with `mclapply` or `mcMap` (from the `parallel` package). The rslurm package simplifies the process of distributing this type of calculation across a computing cluster that uses the [Slurm](http://slurm.schedmd.com/) workload manager. Its main function, `slurm_apply`, automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common Slurm commands. ### Table of contents - [Basic example](#basic-example) - [Single function evaluation](#single-function-evaluation) - [Adding auxiliary data and functions](#adding-auxiliary-data-and-functions) - [Configuring Slurm options](#configuring-slurm-options) - [Generating scripts for later submission](#generating-scripts-for-later-submission) - [How it works / advanced customization](#how-it-works-advanced-customization) ## Basic example To illustrate a typical rslurm workflow, we use a simple function that takes a mean and standard deviation as parameters, generates a million normal deviates and returns the sample mean and standard deviation. ```{r} test_func <- function(par_mu, par_sd) { samp <- rnorm(10^6, par_mu, par_sd) c(s_mu = mean(samp), s_sd = sd(samp)) } ``` We then create a parameter data frame where each row is a parameter set and each column matches an argument of the function. ```{r} pars <- data.frame(par_mu = 1:10, par_sd = seq(0.1, 1, length.out = 10)) head(pars, 3) ``` We can now pass that function and the parameters data frame to `slurm_apply`, specifiying the number of cluster nodes to use and the number of CPUs per node. The latter (`cpus_per_node`) determines how many processes will be forked on each node, as the `mc.cores` argument of `parallel::mcMap`. ```{r} library(rslurm) sjob <- slurm_apply(test_func, pars, jobname = 'test_apply', nodes = 2, cpus_per_node = 2, submit = FALSE) ``` The output of `slurm_apply` is a `slurm_job` object that stores a few pieces of information (job name, job ID, and the number of nodes) needed to retrieve the job's output. The default argument `submit = TRUE` would submit a generated script to the Slurm cluster and print a message confirming the job has been submitted to Slurm, assuming your are running R on a Slurm head node. When working from a R session without direct access to the cluster, you must set `submit = FALSE`. Either way, the function creates a folder called `\_rslurm\_[jobname]` in the working directory that contains scripts and data files. This folder may be moved to a Slurm head node, the shell command `sbatch submit.sh` run from within the folder, and the folder moved back to your working directory. The contents of the `\_rslurm\_[jobname]` folder after completion of the `test_apply` job, i.e. following either manual or automatic (i.e. with `submit = TRUE`) submission to the cluster, includes one `results_*.RDS` file for each node: ```{r} list.files('_rslurm_test_apply', 'results') ``` The results from all the nodes can be read back into R with the `get_slurm_out()` function. ```{r} res <- get_slurm_out(sjob, outtype = 'table') head(res, 3) ``` The utility function `print_job_status` displays the status of a submitted job (i.e. in queue, running or completed), and `cancel_slurm` will remove a job from the queue, aborting its execution if necessary. These functions are R wrappers for the Slurm command line functions `squeue` and `scancel`, respectively. When `outtype = 'table'`, the outputs from each function evaluation are row-bound into a single data frame; this is an appropriate format when the function returns a simple vector. The default `outtype = 'raw'` combines the outputs into a list and can thus handle arbitrarily complex return objects. ```{r} res_raw <- get_slurm_out(sjob, outtype = 'raw') res_raw[1:3] ``` The utility function `cleanup_files` deletes the temporary folder for the specified Slurm job. ```{r eval = FALSE} cleanup_files(sjob) ``` ## Single function evaluation In addition to `slurm_apply`, rslurm also defines a `slurm_call` function, which sends a single function call to the cluster. It is analogous in syntax to the base R function `do.call`, accepting a function and a named list of parameters as arguments. ```{r} sjob <- slurm_call(test_func, jobname = 'test_call', list(par_mu = 5, par_sd = 1), submit = FALSE) ``` Because `slurm_call` involves a single process on a single node, it does not recognize the `nodes` and `cpus_per_node` arguments; otherwise, it accepts the same additional arguments (detailed in the sections below) as `slurm_apply`. ```{r eval = FALSE} cleanup_files(sjob) ``` ## Adding auxiliary data and functions The function passed to `slurm_apply` can only receive atomic parameters stored within a data frame. Suppose we want instead to apply a function `func` to a list of complex R objects, `obj_list`. To use `slurm_apply` in this case, we can wrap `func` in an inline function that takes an index as its sole parameter. ```{r eval = FALSE} sjob <- slurm_apply(function(i) func(obj_list[[i]]), data.frame(i = seq_along(obj_list)), add_objects = c("func", "obj_list"), nodes = 2, cpus_per_node = 2) ``` The `add_objects` argument specifies the names of any R objects (besides the parameters data frame) that must be accessed by the function passed to `slurm_apply`. These objects are saved to a `.RDS` file that is loaded on each cluster node prior to evaluating the function in parallel. By default, all R packages attached to the current R session will also be attached (with `library`) on each cluster node, though this can be modified with the optional `pkgs` argument. ## Configuring Slurm options Particular clusters may require the specification of additional Slurm options, such as time and memory limits for the job. The `slurm_options` argument allows you to set any of the command line options ([view list](http://slurm.schedmd.com/sbatch.html)) recognized by the Slurm `sbatch` command. It should be formatted as a named list, using the long names of each option (e.g. "time" rather than "t"). Flags, i.e. command line options that are toggled rather than set to a particular value, should be set to `TRUE` in `slurm_options`. For example, the following code sets the command line options `--time=1:00:00 --share`. ```{r eval = FALSE} sopt <- list(time = '1:00:00', share = TRUE) sjob <- slurm_apply(test_func, pars, slurm_options = sopt) ``` ## How it works / advanced customization As mentioned above, the `slurm_apply` function creates a job-specific folder. This folder contains the parameters as a RDS file and (if applicable) the objects specified as `add_objects` saved together in a RData file. The function also generates a R script (`slurm_run.R`) to be run on each cluster node, as well as a Bash script (`submit.sh`) to submit the job to Slurm. More specifically, the Bash script tells Slurm to create a job array and the R script takes advantage of the unique `SLURM\_ARRAY\_TASK\_ID` environment variable that Slurm will set on each cluster node. This variable is read by `slurm_run.R`, which allows each instance of the script to operate on a different parameter subset and write its output to a different results file. The R script calls `parallel::mcMap` to parallelize calculations on each node. Both `slurm_run.R` and `submit.sh` are generated from templates, using the [`whisker`](https://cran.r-project.org/package=whisker) package; these templates can be found in the `rslurm/templates` subfolder in your R package library. There are two templates for each script, one for `slurm_apply` and the other (with the word "single"" in its title) for `slurm_call`. While you should avoid changing any existing lines in the template scripts, you may want to add `#SBATCH` lines to the `submit.sh` templates in order to permanently set certain Slurm command line options and thus customize the package to your particular cluster setup.rslurm/vignettes/_rslurm_test_call/0000755000176200001440000000000013563330662017332 5ustar liggesusersrslurm/vignettes/_rslurm_test_call/params.RDS0000644000176200001440000000014313563516361021167 0ustar liggesusersb```f`f@&s0|@ `H%`AĢRd^q "R }rslurm/vignettes/_rslurm_test_call/slurm_run.R0000644000176200001440000000072213563516361021506 0ustar liggesuserslibrary(base, quietly = TRUE) library(methods, quietly = TRUE) library(datasets, quietly = TRUE) library(utils, quietly = TRUE) library(grDevices, quietly = TRUE) library(graphics, quietly = TRUE) library(stats, quietly = TRUE) library(rslurm, quietly = TRUE) .rslurm_func <- readRDS('f.RDS') .rslurm_params <- readRDS('params.RDS') .rslurm_result <- do.call(.rslurm_func, .rslurm_params) saveRDS(.rslurm_result, file = 'results_0.RDS') rslurm/vignettes/_rslurm_test_call/f.RDS0000644000176200001440000000253113563516361020134 0ustar liggesusersXmSF>Y(Sʞ11/LHcK`3ߟ}myC1]4Uǰ϶Qo NݭKm̟Nmu>5m3ùlO֟>myiۭ/揖Ẇ1piZ9E 5-=m+ө۵y.-idW?mjۭ醯H @=+éVeԭ0" @kRʆέVŠjtd!Z)lj~w~4UΜެ;\9x8w.|hpâpzːtKeAZ(cg82.p1JX&9(:~P ʢ|TZ ƪ!W[aN@~WŒX ~ "0' >E$B"$\<$IƃB}źy#c&?x3П$A$+ۣ)£;?7ibSXKDbMgCx?Kg }>V@NߏG 5#zX?b]EأŐD}#b9>y o߅أDŽ"Ay.~ ok $={b~ O YG!,A焟'"AK?|H o?"{~=_ "F31{2~d~Y[_Yb/8QCMl?sy>CxWnaXvm7,܅@ivWݽZR9?QܸظثOȩZF臨W>>,*BZнnt ՊR?@j:֊Ij`z0߽ S`#`Rnco \~H*ᄪ֢抋tk6ǂ<!i88*.C%<eIbgCfr "^qT e `mMIAqc]ͼ?F%+5==x zDSl-#8 -<`}+q>#lcI_C =hZbN$SM7vc?sځOQ E/}L|p!\AfD(T~Mx)3El6}>޾&9凭"U5BҤ= ?rslurm/vignettes/_rslurm_test_call/submit.sh0000644000176200001440000000023613563516361021174 0ustar liggesusers#!/bin/bash # #SBATCH --ntasks=1 #SBATCH --job-name=test_call #SBATCH --output=slurm_0.out C:/PROGRA~1/R/R-36~1.0/bin/x64/Rscript --vanilla slurm_run.R rslurm/R/0000755000176200001440000000000013562577243012015 5ustar liggesusersrslurm/R/rslurm-package.R0000644000176200001440000000765713563271400015057 0ustar liggesusers#' Introduction to the \code{rslurm} Package #' #' Send long-running or parallel jobs to a Slurm workload manager (i.e. cluster) #' using the \code{\link{slurm_call}} or \code{\link{slurm_apply}} functions. #' #' @section Job submission: #' #' This package includes two core functions used to send computations to a #' Slurm cluster: 1) \code{\link{slurm_call}} executes a function using a #' single set of parameters (passed as a list), and 2) \code{\link{slurm_apply}} #' evaluates a function in parallel for each row of parameters in a given #' data frame. The second, \code{slurm_apply}, automatically splits the parameter #' rows into equal-size chunks, each chunk to be processed by a separate cluster #' node. It uses functions from the \code{\link[parallel]{parallel-package}} #' package to parallelize computations across processors on a given node. #' #' The output of \code{slurm_apply} or \code{slurm_call} is a \code{slurm_job} #' object that serves as an input to the other functions in the package: #' \code{\link{print_job_status}}, \code{\link{cancel_slurm}}, #' \code{\link{get_slurm_out}} and \code{\link{cleanup_files}}. #' #' @section Function specification: #' #' To be compatible with \code{\link{slurm_apply}}, a function may accept any #' number of single value parameters. The names of these parameters must match #' the column names of the \code{params} data frame supplied. There are no #' restrictions on the types of parameters passed as a list to #' \code{\link{slurm_call}}. #' #' If the function passed to \code{slurm_call} or \code{slurm_apply} requires #' knowledge of any R objects (data, custom helper functions) besides #' \code{params}, a character vector corresponding to their names should be #' passed to the optional \code{add_objects} argument. #' #' When parallelizing a function, since any error will interrupt all #' calculations for the current node, it may be useful to wrap expressions #' which may generate errors into a \code{\link[base]{try}} or #' \code{\link[base:conditions]{tryCatch}} function. This will ensure the computation #' continues with the next parameter set after reporting the error. #' #' @section Output Format: #' #' The default output format for \code{get_slurm_out} (\code{outtype = "raw"}) #' is a list where each element is the return value of one function call. If #' the function passed to \code{slurm_apply} produces a vector output, you may #' use \code{outtype = "table"} to collect the output in a single data frame, #' with one row by function call. #' #' @section Slurm Configuration: #' #' Advanced options for the Slurm workload manager may accompany job submission #' by both \code{\link{slurm_call}} and \code{\link{slurm_apply}} through the #' optional \code{slurm_options} argument. For example, passing #' \code{list(time = '1:30')} for this options limits the job to 1 hour and 30 #' minutes. Some advanced configuration must be set through environment #' variables. On a multi-cluster head node, for example, the \code{SLURM_CLUSTERS} #' environment variable must be set to direct jobs to a non-default cluster. #' #' @examples #' #' \dontrun{ #' # Create a data frame of mean/sd values for normal distributions #' pars <- data.frame(par_m = seq(-10, 10, length.out = 1000), #' par_sd = seq(0.1, 10, length.out = 1000)) #' #' # Create a function to parallelize #' ftest <- function(par_m, par_sd) { #' samp <- rnorm(10^7, par_m, par_sd) #' c(s_m = mean(samp), s_sd = sd(samp)) #' } #' #' sjob1 <- slurm_apply(ftest, pars) #' print_job_status(sjob1) #' res <- get_slurm_out(sjob1, "table") #' all.equal(pars, res) # Confirm correct output #' cleanup_files(sjob1) #' } #' #' @importFrom utils capture.output #' @docType package #' @name rslurm-package #' NULL rslurm/R/slurm_utils.R0000644000176200001440000000476213556566472014540 0ustar liggesusers# Utility functions for rslurm package (not exported) # Make jobname by cleaning user-provided name or (if NA) generate one # from base::tempfile make_jobname <- function(name) { if (is.na(name)) { tmpfile <- tempfile("_rslurm_", tmpdir=".") strsplit(tmpfile, '_rslurm_', TRUE)[[1]][[2]] } else { jobname <- gsub("[[:space:]]+", "_", name) gsub("[^0-9A-Za-z_]", "", jobname) } } # Format sbatch options into nested list for templates format_option_list <- function(slurm_options) { if (length(slurm_options) == 0) { slurm_flags <- slurm_options } else { is_flag <- sapply(slurm_options, isTRUE) slurm_flags <- lapply(names(slurm_options[is_flag]), function(x) { list(name = x) }) slurm_options <- slurm_options[!is_flag] slurm_options <- lapply(seq_along(slurm_options), function(i) { list(name = names(slurm_options)[i], value = slurm_options[[i]]) }) } list(flags = slurm_flags, options = slurm_options) } # Run an array job (output of slurm_apply) locally; used in package tests local_slurm_array <- function(slr_job) { olddir <- getwd() rscript_path <- file.path(R.home("bin"), "Rscript") setwd(paste0("_rslurm_", slr_job$jobname)) tryCatch({ #FIXME simplify with system('SLURM_ARRAY_TASK_ID=1 Rscript path/to/slurm_run.R') # and loop in this code writeLines(c(paste0("for (i in 1:", slr_job$nodes, " - 1) {"), "Sys.setenv(SLURM_ARRAY_TASK_ID = i)", "source('slurm_run.R')", "}"), "local_run.R") system(paste(rscript_path, "--vanilla local_run.R")) }, finally = setwd(olddir)) return(slr_job) } # Submit job submit_slurm_job <- function(tmpdir) { old_wd <- setwd(tmpdir) tryCatch({ system("sbatch submit.sh") }, finally = setwd(old_wd)) } # Submit dummy job with a dependency via srun to block R process wait_for_job <- function(slr_job) { queued <- system( paste('test -z "$(squeue -hn', slr_job$jobname, '2>/dev/null)"'), ignore.stderr = TRUE) if (queued) { srun <- sprintf(paste('srun', '--nodes=1', '--time=0:1', '--output=/dev/null', '--quiet', '--dependency=singleton', '--job-name=%s', 'echo 0'), slr_job$jobname) system(srun) } return() } rslurm/R/slurm_job.R0000644000176200001440000000170313562577243014135 0ustar liggesusers#' Create a slurm_job object #' #' This function creates a \code{slurm_job} object which can be passed to other #' functions such as \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, #' \code{\link{get_slurm_out}} and \code{\link{get_job_status}}. #' #' In general, \code{slurm_job} objects are created automatically as the output of #' \code{\link{slurm_apply}} or \code{\link{slurm_call}}, but it may be necessary #' to manually recreate one if the job was submitted in a different R session. #' #' @param jobname The name of the Slurm job. The rslurm-generated scripts and #' output files associated with a job should be found in the #' \emph{_rslurm_[jobname]} folder. #' @param nodes The number of cluster nodes used by that job. #' @return A \code{slurm_job} object. #' @export slurm_job <- function(jobname, nodes) { slr_job <- list(jobname = jobname, nodes = nodes) class(slr_job) <- "slurm_job" slr_job } rslurm/R/slurm_call.R0000644000176200001440000001530713562577243014303 0ustar liggesusers#' Execution of a single function call on the Slurm cluster #' #' Use \code{slurm_call} to perform a single function evaluation a the Slurm #' cluster. #' #' This function creates a temporary folder ("_rslurm_[jobname]") in the current #' directory, holding .RData and .RDS data files, the R script to run and the #' Bash submission script generated for the Slurm job. #' #' The names of any other R objects (besides \code{params}) that \code{f} needs #' to access should be listed in the \code{add_objects} argument. #' #' Use \code{slurm_options} to set any option recognized by \code{sbatch}, e.g. #' \code{slurm_options = list(time = "1:00:00", share = TRUE)}. See #' \url{http://slurm.schedmd.com/sbatch.html} for details on possible options. #' Note that full names must be used (e.g. "time" rather than "t") and that #' flags (such as "share") must be specified as TRUE. The "job-name", "ntasks" #' and "output" options are already determined by \code{slurm_call} and should #' not be manually set. #' #' When processing the computation job, the Slurm cluster will output two files #' in the temporary folder: one with the return value of the function #' ("results_0.RDS") and one containing any console or error output produced by #' R ("slurm_[node_id].out"). #' #' If \code{submit = TRUE}, the job is sent to the cluster and a confirmation #' message (or error) is output to the console. If \code{submit = FALSE}, a #' message indicates the location of the saved data and script files; the job #' can be submitted manually by running the shell command \code{sbatch #' submit.sh} from that directory. #' #' After sending the job to the Slurm cluster, \code{slurm_call} returns a #' \code{slurm_job} object which can be used to cancel the job, get the job #' status or output, and delete the temporary files associated with it. See the #' description of the related functions for more details. #' #' @param f Any R function. #' @param params A named list of parameters to pass to \code{f}. #' @param jobname The name of the Slurm job; if \code{NA}, it is assigned a #' random name of the form "slr####". #' @param add_objects A character vector containing the name of R objects to be #' saved in a .RData file and loaded on each cluster node prior to calling #' \code{f}. #' @param pkgs A character vector containing the names of packages that must be #' loaded on each cluster node. By default, it includes all packages loaded by #' the user when \code{slurm_call} is called. #' @param libPaths A character vector describing the location of additional R #' library trees to search through, or NULL. The default value of NULL #' corresponds to libraries returned by \code{.libPaths()} on a cluster node. #' Non-existent library trees are silently ignored. #' @param rscript_path The location of the Rscript command. If not specified, #' defaults to the location of Rscript within the R installation being run. #' @param r_template The path to the template file for the R script run on each node. #' If NULL, uses the default template "rslurm/templates/slurm_run_single_R.txt". #' @param sh_template The path to the template file for the sbatch submission script. #' If NULL, uses the default template "rslurm/templates/submit_single_sh.txt". #' @param slurm_options A named list of options recognized by \code{sbatch}; see #' Details below for more information. #' @param submit Whether or not to submit the job to the cluster with #' \code{sbatch}; see Details below for more information. #' @return A \code{slurm_job} object containing the \code{jobname} and the number #' of \code{nodes} effectively used. #' @seealso \code{\link{slurm_apply}} to parallelize a function over a parameter #' set. #' @seealso \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, #' \code{\link{get_slurm_out}} and \code{\link{get_job_status}} which use #' the output of this function. #' @export slurm_call <- function(f, params, jobname = NA, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE) { # Check inputs if (!is.function(f)) { stop("first argument to slurm_call should be a function") } if (!is.list(params)) { stop("second argument to slurm_call should be a list") } if (is.null(names(params)) || any(!names(params) %in% names(formals(f)))) { stop("names of params must match arguments of f") } # Default templates if(is.null(r_template)) { r_template <- system.file("templates/slurm_run_single_R.txt", package = "rslurm") } if(is.null(sh_template)) { sh_template <- system.file("templates/submit_single_sh.txt", package = "rslurm") } jobname <- make_jobname(jobname) # Create temp folder tmpdir <- paste0("_rslurm_", jobname) dir.create(tmpdir, showWarnings = FALSE) saveRDS(params, file = file.path(tmpdir, "params.RDS")) saveRDS(f, file = file.path(tmpdir, "f.RDS")) if (!is.null(add_objects)) { save(list = add_objects, file = file.path(tmpdir, "add_objects.RData"), envir = environment(f)) } # Create a R script to run function on cluster template_r <- readLines(r_template) script_r <- whisker::whisker.render(template_r, list(pkgs = pkgs, add_obj = !is.null(add_objects), libPaths = libPaths)) writeLines(script_r, file.path(tmpdir, "slurm_run.R")) # Create submission bash script template_sh <- readLines(sh_template) slurm_options <- format_option_list(slurm_options) if (is.null(rscript_path)){ rscript_path <- file.path(R.home("bin"), "Rscript") } script_sh <- whisker::whisker.render(template_sh, list(jobname = jobname, flags = slurm_options$flags, options = slurm_options$options, rscript = rscript_path)) writeLines(script_sh, file.path(tmpdir, "submit.sh")) # Submit job to Slurm if applicable if (submit && system('squeue', ignore.stdout = TRUE)) { submit <- FALSE cat("Cannot submit; no Slurm workload manager found\n") } if (submit) { submit_slurm_job(tmpdir) } else { cat(paste("Submission scripts output in directory", tmpdir)) } # Return 'slurm_job' object slurm_job(jobname, 1) } rslurm/R/slurm_apply.R0000644000176200001440000002235413562577243014515 0ustar liggesusers#' Parallel execution of a function on the Slurm cluster #' #' Use \code{slurm_apply} to compute function over multiple sets of #' parameters in parallel, spread across multiple nodes of a Slurm cluster. #' #' This function creates a temporary folder ("_rslurm_[jobname]") in the current #' directory, holding .RData and .RDS data files, the R script to run and the Bash #' submission script generated for the Slurm job. #' #' The set of input parameters is divided in equal chunks sent to each node, and #' \code{f} is evaluated in parallel within each node using functions from the #' \code{parallel} R package. The names of any other R objects (besides #' \code{params}) that \code{f} needs to access should be included in #' \code{add_objects}. #' #' Use \code{slurm_options} to set any option recognized by \code{sbatch}, e.g. #' \code{slurm_options = list(time = "1:00:00", share = TRUE)}. #' See \url{http://slurm.schedmd.com/sbatch.html} for details on possible options. #' Note that full names must be used (e.g. "time" rather than "t") and that flags #' (such as "share") must be specified as TRUE. The "array", "job-name", "nodes", #' "cpus-per-task" and "output" options are already determined by #' \code{slurm_apply} and should not be manually set. #' #' When processing the computation job, the Slurm cluster will output two types #' of files in the temporary folder: those containing the return values of the #' function for each subset of parameters ("results_[node_id].RDS") and those #' containing any console or error output produced by R on each node #' ("slurm_[node_id].out"). #' #' If \code{submit = TRUE}, the job is sent to the cluster and a confirmation #' message (or error) is output to the console. If \code{submit = FALSE}, #' a message indicates the location of the saved data and script files; the #' job can be submitted manually by running the shell command #' \code{sbatch submit.sh} from that directory. #' #' After sending the job to the Slurm cluster, \code{slurm_apply} returns a #' \code{slurm_job} object which can be used to cancel the job, get the job #' status or output, and delete the temporary files associated with it. See #' the description of the related functions for more details. #' #' @param f A function that accepts one or many single values as parameters and #' may return any type of R object. #' @param params A data frame of parameter values to apply \code{f} to. Each #' column corresponds to a parameter of \code{f} (\emph{Note}: names must #' match) and each row corresponds to a separate function call. #' @param jobname The name of the Slurm job; if \code{NA}, it is assigned a #' random name of the form "slr####". #' @param nodes The (maximum) number of cluster nodes to spread the calculation #' over. \code{slurm_apply} automatically divides \code{params} in chunks of #' approximately equal size to send to each node. Less nodes are allocated if #' the parameter set is too small to use all CPUs on the requested nodes. #' @param cpus_per_node The number of CPUs requested per node, i.e., how many #' processes to run in parallel per node. This argument is mapped to the #' Slurm parameter \code{cpus-per-task}. #' @param preschedule_cores Corresponds to the \code{mc.preschedule} argument of #' \code{parallel::mcmapply}. Defaults to \code{TRUE}. If \code{TRUE}, the #' jobs are assigned to cores before computation. If \code{FALSE}, a new job is #' created for each row of \code{params}. Setting \code{FALSE} may be faster if #' different values of \code{params} result in very variable completion time for #' jobs. #' @param add_objects A character vector containing the name of R objects to be #' saved in a .RData file and loaded on each cluster node prior to calling #' \code{f}. #' @param pkgs A character vector containing the names of packages that must #' be loaded on each cluster node. By default, it includes all packages #' loaded by the user when \code{slurm_apply} is called. #' @param libPaths A character vector describing the location of additional R #' library trees to search through, or NULL. The default value of NULL #' corresponds to libraries returned by \code{.libPaths()} on a cluster node. #' Non-existent library trees are silently ignored. #' @param rscript_path The location of the Rscript command. If not specified, #' defaults to the location of Rscript within the R installation being run. #' @param r_template The path to the template file for the R script run on each node. #' If NULL, uses the default template "rslurm/templates/slurm_run_R.txt". #' @param sh_template The path to the template file for the sbatch submission script. #' If NULL, uses the default template "rslurm/templates/submit_sh.txt". #' @param slurm_options A named list of options recognized by \code{sbatch}; see #' Details below for more information. #' @param submit Whether or not to submit the job to the cluster with #' \code{sbatch}; see Details below for more information. #' @return A \code{slurm_job} object containing the \code{jobname} and the #' number of \code{nodes} effectively used. #' @seealso \code{\link{slurm_call}} to evaluate a single function call. #' @seealso \code{\link{cancel_slurm}}, \code{\link{cleanup_files}}, #' \code{\link{get_slurm_out}} and \code{\link{get_job_status}} #' which use the output of this function. #' @examples #' \dontrun{ #' sjob <- slurm_apply(func, pars) #' get_job_status(sjob) # Prints console/error output once job is completed. #' func_result <- get_slurm_out(sjob, "table") # Loads output data into R. #' cleanup_files(sjob) #' } #' @export slurm_apply <- function(f, params, jobname = NA, nodes = 2, cpus_per_node = 2, preschedule_cores = TRUE, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE) { # Check inputs if (!is.function(f)) { stop("first argument to slurm_apply should be a function") } if (!is.data.frame(params)) { stop("second argument to slurm_apply should be a data.frame") } if (is.null(names(params)) || any(!names(params) %in% names(formals(f)))) { stop("column names of params must match arguments of f") } if (!is.numeric(nodes) || length(nodes) != 1) { stop("nodes should be a single number") } if (!is.numeric(cpus_per_node) || length(cpus_per_node) != 1) { stop("cpus_per_node should be a single number") } # Default templates if(is.null(r_template)) { r_template <- system.file("templates/slurm_run_R.txt", package = "rslurm") } if(is.null(sh_template)) { sh_template <- system.file("templates/submit_sh.txt", package = "rslurm") } jobname <- make_jobname(jobname) # Create temp folder tmpdir <- paste0("_rslurm_", jobname) dir.create(tmpdir, showWarnings = FALSE) saveRDS(params, file = file.path(tmpdir, "params.RDS")) saveRDS(f, file = file.path(tmpdir, "f.RDS")) if (!is.null(add_objects)) { save(list = add_objects, file = file.path(tmpdir, "add_objects.RData"), envir = environment(f)) } # Get chunk size (nb. of param. sets by node) # Special case if less param. sets than CPUs in cluster if (nrow(params) < cpus_per_node * nodes) { nchunk <- cpus_per_node } else { nchunk <- ceiling(nrow(params) / nodes) } # Re-adjust number of nodes (only matters for small sets) nodes <- ceiling(nrow(params) / nchunk) # Create a R script to run function in parallel on each node template_r <- readLines(r_template) script_r <- whisker::whisker.render(template_r, list(pkgs = pkgs, add_obj = !is.null(add_objects), nchunk = nchunk, cpus_per_node = cpus_per_node, preschedule_cores = preschedule_cores, libPaths = libPaths)) writeLines(script_r, file.path(tmpdir, "slurm_run.R")) # Create submission bash script template_sh <- readLines(sh_template) slurm_options <- format_option_list(slurm_options) if (is.null(rscript_path)){ rscript_path <- file.path(R.home("bin"), "Rscript") } script_sh <- whisker::whisker.render(template_sh, list(max_node = nodes - 1, cpus_per_node = cpus_per_node, jobname = jobname, flags = slurm_options$flags, options = slurm_options$options, rscript = rscript_path)) writeLines(script_sh, file.path(tmpdir, "submit.sh")) # Submit job to Slurm if applicable if (submit && system('squeue', ignore.stdout = TRUE)) { submit <- FALSE cat("Cannot submit; no Slurm workload manager found\n") } if (submit) { submit_slurm_job(tmpdir) } else { cat(paste("Submission scripts output in directory", tmpdir)) } # Return 'slurm_job' object slurm_job(jobname, nodes) } rslurm/R/print_job_status.R0000644000176200001440000000354413556566472015544 0ustar liggesusers#' Prints the status of a Slurm job and, if completed, its console/error output #' #' Prints the status of a Slurm job and, if completed, its console/error output. #' #' If the specified Slurm job is still in the queue or running, this function #' prints its current status (as output by the Slurm \code{squeue} command). #' The output displays one row by node currently running part of the job ("R" in #' the "ST" column) and how long it has been running ("TIME"). One row indicates #' the portions of the job still in queue ("PD" in the "ST" column), if any. #' #' If all portions of the job have completed or stopped, the function prints the #' console and error output, if any, generated by each node. #' #' @param slr_job A \code{slurm_job} object. #' #' @name print_job_status-deprecated #' @usage print_job_status(slr_job) #' @seealso \code{\link{rslurm-deprecated}} #' @keywords internal NULL #' @rdname rslurm-deprecated #' @section \code{print_job_status}: #' For \code{print_job_status}, use \code{\link{get_job_status}}. #' #' @export print_job_status <- function(slr_job) { .Deprecated("get_job_status") if (!(class(slr_job) == "slurm_job")) stop("input must be a slurm_job") stat <- suppressWarnings( system(paste("squeue -n", slr_job$jobname), intern = TRUE)) if (length(stat) > 1) { cat(paste(c("Job running or in queue. Status:", stat), collapse = "\n")) } else { cat("Job completed or stopped. Printing console output below if any.\n") tmpdir <- paste0("_rslurm_", slr_job$jobname) out_files <- file.path(tmpdir, paste0("slurm_", 0:(slr_job$nodes - 1), ".out")) for (outf in out_files) { cat(paste("\n----", outf, "----\n\n")) cat(paste(readLines(outf), collapse = "\n")) } } } rslurm/R/get_slurm_out.R0000644000176200001440000000577613556566472015054 0ustar liggesusers#' Reads the output of a function calculated on the Slurm cluster #' #' This function reads all function output files (one by cluster node used) from #' the specified Slurm job and returns the result in a single data frame #' (if "table" format selected) or list (if "raw" format selected). It doesn't #' record any messages (including warnings or errors) output to the R console #' during the computation; these can be consulted by invoking #' \code{\link{print_job_status}}. #' #' The \code{outtype} option is only relevant for jobs submitted with #' \code{slurm_apply}. Jobs sent with \code{slurm_call} only return a single #' object, and setting \code{outtype = "table"} creates an error in that case. #' #' @param slr_job A \code{slurm_job} object. #' @param outtype Can be "table" or "raw", see "Value" below for details. #' @param wait Specify whether to block until \code{slr_job} completes. #' @param ncores (optional) If not null, the number of cores passed to #' mclapply #' @return If \code{outtype = "table"}: A data frame with one column by #' return value of the function passed to \code{slurm_apply}, where #' each row is the output of the corresponding row in the params data frame #' passed to \code{slurm_apply}. #' #' If \code{outtype = "raw"}: A list where each element is the output #' of the function passed to \code{slurm_apply} for the corresponding #' row in the params data frame passed to \code{slurm_apply}. #' @seealso \code{\link{slurm_apply}}, \code{\link{slurm_call}} #' @importFrom parallel mclapply #' @export get_slurm_out <- function(slr_job, outtype = "raw", wait = TRUE, ncores = NULL) { # Check arguments if (!(class(slr_job) == "slurm_job")) { stop("slr_job must be a slurm_job") } outtypes <- c("table", "raw") if (!(outtype %in% outtypes)) { stop(paste("outtype should be one of:", paste(outtypes, collapse = ', '))) } if (!(is.null(ncores) || (is.numeric(ncores) && length(ncores) == 1))) { stop("ncores must be an integer number of cores") } # Wait for slr_job using Slurm dependency if (wait) { wait_for_job(slr_job) } res_files <- paste0("results_", 0:(slr_job$nodes - 1), ".RDS") tmpdir <- paste0("_rslurm_", slr_job$jobname) missing_files <- setdiff(res_files, dir(path = tmpdir)) if (length(missing_files) > 0) { missing_list <- paste(missing_files, collapse = ", ") warning(paste("The following files are missing:", missing_list)) } res_files <- file.path(tmpdir, setdiff(res_files, missing_files)) if (length(res_files) == 0) return(NA) if (is.null(ncores)) { slurm_out <- lapply(res_files, readRDS) } else { slurm_out <- mclapply(res_files, readRDS, mc.cores = ncores) } slurm_out <- do.call(c, slurm_out) if (outtype == "table") { slurm_out <- as.data.frame(do.call(rbind, slurm_out)) } slurm_out } rslurm/R/rslurm-deprecated.R0000644000176200001440000000061413556566472015570 0ustar liggesusers#' @title Deprecated functions in package \pkg{rslurm}. #' @description The functions listed below are deprecated and will be defunct in #' the near future. When possible, alternative functions with similar #' functionality are also mentioned. Help pages for deprecated functions are #' available at \code{help("-deprecated")}. #' @name rslurm-deprecated #' @keywords internal NULL rslurm/R/cancel_cleanup.R0000644000176200001440000000336613556566472015111 0ustar liggesusers#' Cancels a scheduled Slurm job #' #' This function cancels the specified Slurm job by invoking the Slurm #' \code{scancel} command. It does \emph{not} delete the temporary files #' (e.g. scripts) created by \code{\link{slurm_apply}} or #' \code{\link{slurm_call}}. Use \code{\link{cleanup_files}} to remove those #' files. #' #' @param slr_job A \code{slurm_job} object. #' @seealso \code{\link{cleanup_files}} #' @export cancel_slurm <- function(slr_job) { if (!(class(slr_job) == "slurm_job")) stop("input must be a slurm_job") system(paste("scancel -n", slr_job$jobname)) } #' Deletes temporary files associated with a Slurm job #' #' This function deletes all temporary files associated with the specified Slurm #' job, including files created by \code{\link{slurm_apply}} or #' \code{\link{slurm_call}}, as well as outputs from the cluster. These files #' should be located in the \emph{_rslurm_[jobname]} folder of the current #' working directory. #' #' @param slr_job A \code{slurm_job} object. #' @param wait Specify whether to block until \code{slr_job} completes. #' @examples #' \dontrun{ #' sjob <- slurm_apply(func, pars) #' print_job_status(sjob) # Prints console/error output once job is completed. #' func_result <- get_slurm_out(sjob, "table") # Loads output data into R. #' cleanup_files(sjob) #' } #' @seealso \code{\link{slurm_apply}}, \code{\link{slurm_call}} #' @export cleanup_files <- function(slr_job, wait = TRUE) { if (!(class(slr_job) == "slurm_job")) stop("input must be a slurm_job") if (wait) wait_for_job(slr_job) tmpdir <- paste0("_rslurm_", slr_job$jobname) if (!(tmpdir %in% dir())) stop(paste("folder", tmpdir, "not found")) unlink(tmpdir, recursive = TRUE) } rslurm/R/get_job_status.R0000644000176200001440000000427413561327610015151 0ustar liggesusers#' Get the status of a Slurm job #' #' This function returns the completion status of a Slurm job, its queue status #' if any and log outputs. #' #' The \code{queue} element of the output is a data frame matching the output #' of the Slurm \code{squeue} command for that job; it will only indicate portions #' of job that are running or in queue. The \code{log} element is a #' vector of the contents of console/error output files for each node where the #' job is running. #' #' @importFrom utils read.table #' @param slr_job A \code{slurm_job} object. #' @return A list with three elements: \code{completed} is a logical value #' indicating if all portions of the job have completed or stopped, \code{queue} #' contains the information on job elements still in queue, and #' \code{log} contains the console/error logs. #' @export get_job_status <- function(slr_job) { if (!(class(slr_job) == "slurm_job")) stop("input must be a slurm_job") # Get queue info squeue_out <- suppressWarnings( system(paste("squeue -n", slr_job$jobname), intern = TRUE) ) queue <- read.table(text = squeue_out, header = TRUE) completed <- nrow(queue) == 0 # Get output logs tmpdir <- paste0("_rslurm_", slr_job$jobname) out_files <- file.path(tmpdir, paste0("slurm_", 0:(slr_job$nodes - 1), ".out")) logs <- vapply(out_files, function(outf) paste(readLines(outf), collapse = "\n"), "") job_status <- list(completed = completed, queue = queue, log = logs) class(job_status) <- "slurm_job_status" job_status } # Format job status output print.slurm_job_status <- function(stat) { if (stat$complete) { cat("Job completed or stopped.\n\n") } else { print(stat$queue) cat("\n") } cat("Last console output\n\n") shorten_log <- function(txt, lmax = 60) { # Shorten txt to lmax chars. l <- nchar(txt) ifelse(l <= lmax, txt, paste("...", substr(txt, l - lmax, l))) } for (i in seq_along(stat$log)) { cat(paste0(i-1, ": ", shorten_log(stat$log[[i]]), "\n")) } } rslurm/NEWS.md0000644000176200001440000001166513563516100012705 0ustar liggesusers# rslurm 0.5.0 ### New features and fixes * Improved status with `get_job_status`, deprecating `print_job_status` ([#37](https://github.com/sesync-ci/rslurm/pull/37)). * Use `mclapply` within `get_slurm_out` to gather results ([#30](https://github.com/sesync-ci/rslurm/pull/30)). * Allow user to provide custom .R and .sh templates ([#47](https://github.com/sesync-ci/rslurm/pull/47)). * Allow user to specify path to `Rscript` ([#45](https://github.com/sesync-ci/rslurm/pull/45)) and number of CPUS per task ([#36](https://github.com/sesync-ci/rslurm/pull/36)). * Allow user to disable core prescheduling if tasks have high variance in completion time ([816b40e](https://github.com/SESYNC-ci/rslurm/commit/816b40e)). # rslurm 0.4.0 ### New features and fixes * Pass (serialized) functions to Slurm nodes without stringifying. * Save `add_objects` objects from correct environment. * Package tests evaluate on a cluster when available. * Include reverse dependency check in release process. ### Reversions * README now separate from package documentation. * Vignette can be built on CRAN tests again (no slurm submissions). * Returned to `parallel::mcmapply`, without SIMPLIFY, to prevent `mc.cores` error when checking on Windows. # rslurm 0.3.3 ### New features and fixes * Create README from R/slurm.R. # rslurm 0.3.2 ### New features and fixes * `wait` argument adds option to `slurm_apply` and `slurm_call` to block the calling script until the submitted job completes. This option can be used to allow immediate processing of a submitted job's output ([#2](https://github.com/sesync-ci/rslurm/pull/2)). * Use ".RDS" file extension, rather than ".RData", for serialized objects ([#4](https://github.com/sesync-ci/rslurm/pull/4)). * Minor bug fixes ([#4](https://github.com/sesync-ci/rslurm/pull/4)). # rslurm 0.3.1 ### New features and fixes * Minor bug fix: specify full path of 'Rscript' when running batch scripts. # rslurm 0.3.0 *First version on CRAN* ### New features and fixes * Added a `submit` argument to `slurm_apply` and `slurm_call`. If `submit = FALSE`, the submission scripts are created but not run. This is useful if the files need to be transferred from a local machine to the cluster and run at a later time. * Added new optional arguments to `slurm_apply` and `slurm_call`, allowing users to give informative names to SLURM jobs (`jobname`) and set any options understood by `sbatch` (`slurm_options`). * The `data_file` argument to `slurm_apply` and `slurm_call` is replaced with `add_objects`, which accepts a vector of R object names from the active workspace and automatically saves them in a .RData file to be loaded on each node. * `slurm_apply` and `slurm_call` now generate R and Bash scripts through [whisker](https://github.com/edwindj/whisker) templates. Advanced users may want to edit those templates in the `templates` folder of the installed R package (e.g. to set default *SBATCH* options in `submit.sh`). * Files generated by the package (scripts, data files and output) are now saved in a subfolder named `_rslurm_[jobname]` in the current working directory. * Minor updates, including reformatting the output of `print_job_status` and removing this package's dependency on `stringr`. # rslurm 0.2.0 *2015-11-23* ### New features and fixes * Changed the `slurm_apply` function to use `parallel::mcMap` instead of `mcmapply`, which fixes a bug where list outputs (i.e. each function call returns a list) would be collapsed in a single list (rather than returned as a list of lists). * Changed the interface so that the output type (table or raw) is now an argument of `get_slurm_out` rather than of `slurm_apply`, and defaults to `raw`. * Added `cpus_per_node` argument to `slurm_apply`, indicating the number of parallel processes to be run on each node. # rslurm 0.1.3 *2015-07-13* ### New features and fixes * Added the `slurm_call` function, which submits a single function evaluation on the cluster, with syntax similar to the base function `do.call`. * `get_slurm_out` can now process the output even if some files are missing, in which case it issues a warning. # rslurm 0.1.2 *2015-06-29* ### New features and fixes * Added the optional argument `pkgs` to `slurm_apply`, indicating which packages should be loaded on each node (by default, all packages currently attached to the user's R session). # rslurm 0.1.1 *2015-06-24* ### New features and fixes * Added the optional argument `output` to `slurm_apply`, which can take the value `table` (each function evaluation returns a row, output is a data frame) or `raw` (each function evaluation returns an arbitrary R object, output is a list). * Fixed a bug in the chunk size calculation for `slurm_apply`. # rslurm 0.1.0 *2015-06-16* ### Initial release * First version of the package released on GitHub. rslurm/MD50000644000176200001440000000561513563553472012132 0ustar liggesusers9471983dba518dfae795ce5e95491830 *DESCRIPTION 9ef0ad15e1517914524f516debfe205a *NAMESPACE eb044466e870c0a6f591971b7ad03f20 *NEWS.md 10883a23d5c5efd4f3766f9afcb8143e *R/cancel_cleanup.R 519d84d9bc6797dc60397f7f3c046a27 *R/get_job_status.R 045a03923cc2b6bb19a33128d3c0935d *R/get_slurm_out.R c090e1117e728525c11401ed7da1fe6b *R/print_job_status.R c6efce4461ff80b3ed62e215892f6b2e *R/rslurm-deprecated.R e12c2821d5133f8cd4d7a70270b0feee *R/rslurm-package.R 35daf618fbff08ec9f83ba2991f118a9 *R/slurm_apply.R b46c590f6e2a46cec0b23a7c515355a1 *R/slurm_call.R ed60a3f0b54e2eaeda28f5666b0d9a62 *R/slurm_job.R 16054688520c99e45658def5aa14244a *R/slurm_utils.R e98998131cd96436ad35ed0f5fa9a059 *README.md 7e20cdc1ba04cffa6968d5a685dffd65 *build/vignette.rds be91263b50c1212c27296c33afdb8933 *inst/doc/rslurm.Rmd 4853b355b002b76e59633c24971161ed *inst/doc/rslurm.html 066449aa7d5e6a67ba9d4a3a4f089e97 *inst/templates/slurm_run_R.txt 29ba6729646aa3319cf033a4d84b30c7 *inst/templates/slurm_run_single_R.txt ce5dcbbd656560b46aa8ca16960c4061 *inst/templates/submit_sh.txt ff29a886d789b1fd1b791f29d4629a52 *inst/templates/submit_single_sh.txt 1fd8abeda3143bf115832dfcff71fa6e *man/cancel_slurm.Rd b2f663a9b568adbb1052bb07d1a97195 *man/cleanup_files.Rd 9f1e91dae447740c59017118ca4be5f4 *man/get_job_status.Rd a957874782b733c839cfbb8c90ca2acf *man/get_slurm_out.Rd c58ebd77597a9eefb37d20c46582a232 *man/print_job_status-deprecated.Rd 86d1dfca55a31ce1328ca0e865e7b480 *man/rslurm-deprecated.Rd 968550d5f81674a4f72a0e00e59f4448 *man/rslurm-package.Rd ea98019dcd0432776b52e6c6c0739190 *man/slurm_apply.Rd b95fa63993052b06a591f0b3e2a0aca8 *man/slurm_call.Rd 2b7e1be7f6fd065957f95fcdf064dc01 *man/slurm_job.Rd 9dbf87034b279caa6423a6e63386f331 *tests/testthat.R 2b8a7d79fd034484ba61a3f916bf2292 *tests/testthat/test-local_slurm_apply.R a144fc7bde4bdad0b24c03faae0fbac4 *tests/testthat/test-local_slurm_call.R 5ffd7b19a0858ae5cd14dbb71f68efe2 *tests/testthat/test-slurm_apply.R 333319eea218f00ae996cc6c6c33bc45 *tests/testthat/test-slurm_call.R 75c62b2e5100a9b9c95811df4b4c0e5b *vignettes/_rslurm_test_apply/f.RDS 1737f2ca96f34c5b3a1bd6eccb1ae19e *vignettes/_rslurm_test_apply/params.RDS 3cd2997f68d947f7bb1fa6ba5df0bd18 *vignettes/_rslurm_test_apply/results_0.RDS 8a4068d6a8519fc28413210b5484f768 *vignettes/_rslurm_test_apply/results_1.RDS d41d8cd98f00b204e9800998ecf8427e *vignettes/_rslurm_test_apply/slurm_0.out d41d8cd98f00b204e9800998ecf8427e *vignettes/_rslurm_test_apply/slurm_1.out cf12f9e736e53d0d3efbba5a4711e9a7 *vignettes/_rslurm_test_apply/slurm_run.R 623464334291ca14a957e266cfd08e0c *vignettes/_rslurm_test_apply/submit.sh 75c62b2e5100a9b9c95811df4b4c0e5b *vignettes/_rslurm_test_call/f.RDS f41ade504a0f55cab0057ab0ef7eaab4 *vignettes/_rslurm_test_call/params.RDS eb7984634bb858f53d6915edc80dd3a8 *vignettes/_rslurm_test_call/slurm_run.R 0d42a7a82c4aa34dee6514d15fcba83f *vignettes/_rslurm_test_call/submit.sh be91263b50c1212c27296c33afdb8933 *vignettes/rslurm.Rmd rslurm/inst/0000755000176200001440000000000013563516362012565 5ustar liggesusersrslurm/inst/templates/0000755000176200001440000000000013562577243014567 5ustar liggesusersrslurm/inst/templates/submit_sh.txt0000644000176200001440000000046113561333617017320 0ustar liggesusers#!/bin/bash # #SBATCH --array=0-{{{max_node}}} #SBATCH --cpus-per-task={{{cpus_per_node}}} #SBATCH --job-name={{{jobname}}} #SBATCH --output=slurm_%a.out {{#flags}} #SBATCH --{{{name}}} {{/flags}} {{#options}} #SBATCH --{{{name}}}={{{value}}} {{/options}} {{{rscript}}} --vanilla slurm_run.R rslurm/inst/templates/slurm_run_single_R.txt0000644000176200001440000000056513556566473021214 0ustar liggesusers{{#libPaths}}.libPaths(c('{{.}}', .libPaths())) {{/libPaths}} {{#pkgs}}library({{.}}, quietly = TRUE) {{/pkgs}} {{#add_obj}} load('add_objects.RData') {{/add_obj}} .rslurm_func <- readRDS('f.RDS') .rslurm_params <- readRDS('params.RDS') .rslurm_result <- do.call(.rslurm_func, .rslurm_params) saveRDS(.rslurm_result, file = 'results_0.RDS') rslurm/inst/templates/submit_single_sh.txt0000644000176200001440000000036513556566473020700 0ustar liggesusers#!/bin/bash # #SBATCH --ntasks=1 #SBATCH --job-name={{{jobname}}} #SBATCH --output=slurm_0.out {{#flags}} #SBATCH --{{{name}}} {{/flags}} {{#options}} #SBATCH --{{{name}}}={{{value}}} {{/options}} {{{rscript}}} --vanilla slurm_run.R rslurm/inst/templates/slurm_run_R.txt0000644000176200001440000000136713562577243017646 0ustar liggesusers{{#libPaths}}.libPaths(c('{{.}}', .libPaths())) {{/libPaths}} {{#pkgs}}library({{.}}, quietly = TRUE) {{/pkgs}} {{#add_obj}} load('add_objects.RData') {{/add_obj}} .rslurm_func <- readRDS('f.RDS') .rslurm_params <- readRDS('params.RDS') .rslurm_id <- as.numeric(Sys.getenv('SLURM_ARRAY_TASK_ID')) .rslurm_istart <- .rslurm_id * {{{nchunk}}} + 1 .rslurm_iend <- min((.rslurm_id + 1) * {{{nchunk}}}, nrow(.rslurm_params)) .rslurm_result <- do.call(parallel::mcmapply, c( FUN = .rslurm_func, .rslurm_params[.rslurm_istart:.rslurm_iend, , drop = FALSE], mc.cores = {{{cpus_per_node}}}, mc.preschedule = {{{preschedule_cores}}}, SIMPLIFY = FALSE)) saveRDS(.rslurm_result, file = paste0('results_', .rslurm_id, '.RDS')) rslurm/inst/doc/0000755000176200001440000000000013563516362013332 5ustar liggesusersrslurm/inst/doc/rslurm.Rmd0000644000176200001440000002126513556566473015342 0ustar liggesusers--- title: Parallelize R code on a Slurm cluster output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Parallelize R code on a Slurm cluster} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} --- Many computing-intensive processes in R involve the repeated evaluation of a function over many items or parameter sets. These so-called [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) calculations can be run serially with the `lapply` or `Map` function, or in parallel on a single machine with `mclapply` or `mcMap` (from the `parallel` package). The rslurm package simplifies the process of distributing this type of calculation across a computing cluster that uses the [Slurm](http://slurm.schedmd.com/) workload manager. Its main function, `slurm_apply`, automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common Slurm commands. ### Table of contents - [Basic example](#basic-example) - [Single function evaluation](#single-function-evaluation) - [Adding auxiliary data and functions](#adding-auxiliary-data-and-functions) - [Configuring Slurm options](#configuring-slurm-options) - [Generating scripts for later submission](#generating-scripts-for-later-submission) - [How it works / advanced customization](#how-it-works-advanced-customization) ## Basic example To illustrate a typical rslurm workflow, we use a simple function that takes a mean and standard deviation as parameters, generates a million normal deviates and returns the sample mean and standard deviation. ```{r} test_func <- function(par_mu, par_sd) { samp <- rnorm(10^6, par_mu, par_sd) c(s_mu = mean(samp), s_sd = sd(samp)) } ``` We then create a parameter data frame where each row is a parameter set and each column matches an argument of the function. ```{r} pars <- data.frame(par_mu = 1:10, par_sd = seq(0.1, 1, length.out = 10)) head(pars, 3) ``` We can now pass that function and the parameters data frame to `slurm_apply`, specifiying the number of cluster nodes to use and the number of CPUs per node. The latter (`cpus_per_node`) determines how many processes will be forked on each node, as the `mc.cores` argument of `parallel::mcMap`. ```{r} library(rslurm) sjob <- slurm_apply(test_func, pars, jobname = 'test_apply', nodes = 2, cpus_per_node = 2, submit = FALSE) ``` The output of `slurm_apply` is a `slurm_job` object that stores a few pieces of information (job name, job ID, and the number of nodes) needed to retrieve the job's output. The default argument `submit = TRUE` would submit a generated script to the Slurm cluster and print a message confirming the job has been submitted to Slurm, assuming your are running R on a Slurm head node. When working from a R session without direct access to the cluster, you must set `submit = FALSE`. Either way, the function creates a folder called `\_rslurm\_[jobname]` in the working directory that contains scripts and data files. This folder may be moved to a Slurm head node, the shell command `sbatch submit.sh` run from within the folder, and the folder moved back to your working directory. The contents of the `\_rslurm\_[jobname]` folder after completion of the `test_apply` job, i.e. following either manual or automatic (i.e. with `submit = TRUE`) submission to the cluster, includes one `results_*.RDS` file for each node: ```{r} list.files('_rslurm_test_apply', 'results') ``` The results from all the nodes can be read back into R with the `get_slurm_out()` function. ```{r} res <- get_slurm_out(sjob, outtype = 'table') head(res, 3) ``` The utility function `print_job_status` displays the status of a submitted job (i.e. in queue, running or completed), and `cancel_slurm` will remove a job from the queue, aborting its execution if necessary. These functions are R wrappers for the Slurm command line functions `squeue` and `scancel`, respectively. When `outtype = 'table'`, the outputs from each function evaluation are row-bound into a single data frame; this is an appropriate format when the function returns a simple vector. The default `outtype = 'raw'` combines the outputs into a list and can thus handle arbitrarily complex return objects. ```{r} res_raw <- get_slurm_out(sjob, outtype = 'raw') res_raw[1:3] ``` The utility function `cleanup_files` deletes the temporary folder for the specified Slurm job. ```{r eval = FALSE} cleanup_files(sjob) ``` ## Single function evaluation In addition to `slurm_apply`, rslurm also defines a `slurm_call` function, which sends a single function call to the cluster. It is analogous in syntax to the base R function `do.call`, accepting a function and a named list of parameters as arguments. ```{r} sjob <- slurm_call(test_func, jobname = 'test_call', list(par_mu = 5, par_sd = 1), submit = FALSE) ``` Because `slurm_call` involves a single process on a single node, it does not recognize the `nodes` and `cpus_per_node` arguments; otherwise, it accepts the same additional arguments (detailed in the sections below) as `slurm_apply`. ```{r eval = FALSE} cleanup_files(sjob) ``` ## Adding auxiliary data and functions The function passed to `slurm_apply` can only receive atomic parameters stored within a data frame. Suppose we want instead to apply a function `func` to a list of complex R objects, `obj_list`. To use `slurm_apply` in this case, we can wrap `func` in an inline function that takes an index as its sole parameter. ```{r eval = FALSE} sjob <- slurm_apply(function(i) func(obj_list[[i]]), data.frame(i = seq_along(obj_list)), add_objects = c("func", "obj_list"), nodes = 2, cpus_per_node = 2) ``` The `add_objects` argument specifies the names of any R objects (besides the parameters data frame) that must be accessed by the function passed to `slurm_apply`. These objects are saved to a `.RDS` file that is loaded on each cluster node prior to evaluating the function in parallel. By default, all R packages attached to the current R session will also be attached (with `library`) on each cluster node, though this can be modified with the optional `pkgs` argument. ## Configuring Slurm options Particular clusters may require the specification of additional Slurm options, such as time and memory limits for the job. The `slurm_options` argument allows you to set any of the command line options ([view list](http://slurm.schedmd.com/sbatch.html)) recognized by the Slurm `sbatch` command. It should be formatted as a named list, using the long names of each option (e.g. "time" rather than "t"). Flags, i.e. command line options that are toggled rather than set to a particular value, should be set to `TRUE` in `slurm_options`. For example, the following code sets the command line options `--time=1:00:00 --share`. ```{r eval = FALSE} sopt <- list(time = '1:00:00', share = TRUE) sjob <- slurm_apply(test_func, pars, slurm_options = sopt) ``` ## How it works / advanced customization As mentioned above, the `slurm_apply` function creates a job-specific folder. This folder contains the parameters as a RDS file and (if applicable) the objects specified as `add_objects` saved together in a RData file. The function also generates a R script (`slurm_run.R`) to be run on each cluster node, as well as a Bash script (`submit.sh`) to submit the job to Slurm. More specifically, the Bash script tells Slurm to create a job array and the R script takes advantage of the unique `SLURM\_ARRAY\_TASK\_ID` environment variable that Slurm will set on each cluster node. This variable is read by `slurm_run.R`, which allows each instance of the script to operate on a different parameter subset and write its output to a different results file. The R script calls `parallel::mcMap` to parallelize calculations on each node. Both `slurm_run.R` and `submit.sh` are generated from templates, using the [`whisker`](https://cran.r-project.org/package=whisker) package; these templates can be found in the `rslurm/templates` subfolder in your R package library. There are two templates for each script, one for `slurm_apply` and the other (with the word "single"" in its title) for `slurm_call`. While you should avoid changing any existing lines in the template scripts, you may want to add `#SBATCH` lines to the `submit.sh` templates in order to permanently set certain Slurm command line options and thus customize the package to your particular cluster setup.rslurm/inst/doc/rslurm.html0000644000176200001440000005700713563516362015555 0ustar liggesusers Parallelize R code on a Slurm cluster

Parallelize R code on a Slurm cluster

Many computing-intensive processes in R involve the repeated evaluation of a function over many items or parameter sets. These so-called embarrassingly parallel calculations can be run serially with the lapply or Map function, or in parallel on a single machine with mclapply or mcMap (from the parallel package).

The rslurm package simplifies the process of distributing this type of calculation across a computing cluster that uses the Slurm workload manager. Its main function, slurm_apply, automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common Slurm commands.

Basic example

To illustrate a typical rslurm workflow, we use a simple function that takes a mean and standard deviation as parameters, generates a million normal deviates and returns the sample mean and standard deviation.

We then create a parameter data frame where each row is a parameter set and each column matches an argument of the function.

##   par_mu par_sd
## 1      1    0.1
## 2      2    0.2
## 3      3    0.3

We can now pass that function and the parameters data frame to slurm_apply, specifiying the number of cluster nodes to use and the number of CPUs per node. The latter (cpus_per_node) determines how many processes will be forked on each node, as the mc.cores argument of parallel::mcMap.

## Submission scripts output in directory _rslurm_test_apply

The output of slurm_apply is a slurm_job object that stores a few pieces of information (job name, job ID, and the number of nodes) needed to retrieve the job’s output.

The default argument submit = TRUE would submit a generated script to the Slurm cluster and print a message confirming the job has been submitted to Slurm, assuming your are running R on a Slurm head node. When working from a R session without direct access to the cluster, you must set submit = FALSE. Either way, the function creates a folder called \_rslurm\_[jobname] in the working directory that contains scripts and data files. This folder may be moved to a Slurm head node, the shell command sbatch submit.sh run from within the folder, and the folder moved back to your working directory. The contents of the \_rslurm\_[jobname] folder after completion of the test_apply job, i.e. following either manual or automatic (i.e. with submit = TRUE) submission to the cluster, includes one results_*.RDS file for each node:

## [1] "results_0.RDS" "results_1.RDS"

The results from all the nodes can be read back into R with the get_slurm_out() function.

## Warning in system(paste("test -z \"$(squeue -hn", slr_job$jobname, "2>/dev/null)
## \""), : 'test' not found
## Warning in system(srun): 'srun' not found
##       s_mu       s_sd
## 1 1.000137 0.09995552
## 2 2.000144 0.19988175
## 3 2.999822 0.30030102

The utility function print_job_status displays the status of a submitted job (i.e. in queue, running or completed), and cancel_slurm will remove a job from the queue, aborting its execution if necessary. These functions are R wrappers for the Slurm command line functions squeue and scancel, respectively.

When outtype = 'table', the outputs from each function evaluation are row-bound into a single data frame; this is an appropriate format when the function returns a simple vector. The default outtype = 'raw' combines the outputs into a list and can thus handle arbitrarily complex return objects.

## Warning in system(paste("test -z \"$(squeue -hn", slr_job$jobname, "2>/dev/null)
## \""), : 'test' not found
## Warning in system(srun): 'srun' not found
## [[1]]
##       s_mu       s_sd 
## 1.00013690 0.09995552 
## 
## [[2]]
##      s_mu      s_sd 
## 2.0001445 0.1998817 
## 
## [[3]]
##     s_mu     s_sd 
## 2.999822 0.300301

The utility function cleanup_files deletes the temporary folder for the specified Slurm job.

Single function evaluation

In addition to slurm_apply, rslurm also defines a slurm_call function, which sends a single function call to the cluster. It is analogous in syntax to the base R function do.call, accepting a function and a named list of parameters as arguments.

## Submission scripts output in directory _rslurm_test_call

Because slurm_call involves a single process on a single node, it does not recognize the nodes and cpus_per_node arguments; otherwise, it accepts the same additional arguments (detailed in the sections below) as slurm_apply.

Adding auxiliary data and functions

The function passed to slurm_apply can only receive atomic parameters stored within a data frame. Suppose we want instead to apply a function func to a list of complex R objects, obj_list. To use slurm_apply in this case, we can wrap func in an inline function that takes an index as its sole parameter.

The add_objects argument specifies the names of any R objects (besides the parameters data frame) that must be accessed by the function passed to slurm_apply. These objects are saved to a .RDS file that is loaded on each cluster node prior to evaluating the function in parallel.

By default, all R packages attached to the current R session will also be attached (with library) on each cluster node, though this can be modified with the optional pkgs argument.

Configuring Slurm options

Particular clusters may require the specification of additional Slurm options, such as time and memory limits for the job. The slurm_options argument allows you to set any of the command line options (view list) recognized by the Slurm sbatch command. It should be formatted as a named list, using the long names of each option (e.g. “time” rather than “t”). Flags, i.e. command line options that are toggled rather than set to a particular value, should be set to TRUE in slurm_options. For example, the following code sets the command line options --time=1:00:00 --share.

How it works / advanced customization

As mentioned above, the slurm_apply function creates a job-specific folder. This folder contains the parameters as a RDS file and (if applicable) the objects specified as add_objects saved together in a RData file. The function also generates a R script (slurm_run.R) to be run on each cluster node, as well as a Bash script (submit.sh) to submit the job to Slurm.

More specifically, the Bash script tells Slurm to create a job array and the R script takes advantage of the unique SLURM\_ARRAY\_TASK\_ID environment variable that Slurm will set on each cluster node. This variable is read by slurm_run.R, which allows each instance of the script to operate on a different parameter subset and write its output to a different results file. The R script calls parallel::mcMap to parallelize calculations on each node.

Both slurm_run.R and submit.sh are generated from templates, using the whisker package; these templates can be found in the rslurm/templates subfolder in your R package library. There are two templates for each script, one for slurm_apply and the other (with the word “single”" in its title) for slurm_call.

While you should avoid changing any existing lines in the template scripts, you may want to add #SBATCH lines to the submit.sh templates in order to permanently set certain Slurm command line options and thus customize the package to your particular cluster setup.