README

Slurm Workload Manager is a popular HPC cluster job scheduler found in many of the top 500 super computers. The slurmR R package provides an R wrapper to it that matches the parallel package’s syntax, this is, just like parallel provides the parLapply, clusterMap, parSapply, etc., slurmR provides Slurm_lapply, Slurm_Map, Slurm_sapply, etc.

While there are other alternatives such as future.batchtools, batchtools, clustermq, and rslurm, this R package has the following goals:

Installation

From your HPC command line, you can install the development version from GitHub with:

$ git clone https://github.com/USCbiostats/slurmR.git
$ R CMD INSTALL slurmR/

The second line assumes you have R available in your system (usually loaded via module R or some other command). Or using the devtools from within R:

# install.packages("devtools")
devtools::install_github("USCbiostats/slurmR")

Citation

Example 1: Computing means (and looking under the hood)

library(slurmR)
#  Loading required package: parallel
#  slurmR default option for `tmp_path` (used to store auxiliar files) set to:
#    /home/george/Documents/development/slurmR
#  You can change this and checkout other slurmR options using: ?opts_slurmR, or you could just type "opts_slurmR" on the terminal.

# Suppose that we have 100 vectors of length 50 ~ Unif(0,1)
set.seed(881)
x <- replicate(100, runif(50), simplify = FALSE)

ans <- Slurm_lapply(x, mean, plan = "none")
#  Warning: [submit = FALSE] The job hasn't been submitted yet. Use sbatch() to submit the job, or you can submit it via command line using the following:
#  sbatch --job-name=slurmr-job-299a16893fb7 /home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/01-bash.sh
Slurm_clean(ans) # Cleaning after you

Notice the plan = "none" option, this tells Slurm_lapply to only create the job object, but do nothing with it, i.e., skip submission. To get more info, we can actually set the verbose mode on

opts_slurmR$verbose_on()
ans <- Slurm_lapply(x, mean, plan = "none")
#  --------------------------------------------------------------------------------
#  [VERBOSE MODE ON] The R script that will be used is located at: /home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/00-rscript.r and has the following contents:
#  --------------------------------------------------------------------------------
#  .libPaths(c("/home/george/R/x86_64-pc-linux-gnu-library/3.6", "/usr/local/lib/R/site-library", "/usr/lib/R/site-library", "/usr/lib/R/library"))
#  message("[slurmR info] Loading variables and functions... ", appendLF = FALSE)
#  Slurm_env <- function (x = "SLURM_ARRAY_TASK_ID") 
#  {
#      y <- Sys.getenv(x)
#      if ((x == "SLURM_ARRAY_TASK_ID") && y == "") {
#          return(1)
#      }
#      y
#  }
#  ARRAY_ID  <- as.integer(Slurm_env("SLURM_ARRAY_TASK_ID"))
#  
#  # The -snames- function creates the write names for I/O of files as a 
#  # function of the ARRAY_ID
#  snames    <- function (type, array_id = NULL, tmp_path = NULL, job_name = NULL) 
#  {
#      if (length(array_id) && length(array_id) > 1) 
#          return(sapply(array_id, snames, type = type, tmp_path = tmp_path, 
#              job_name = job_name))
#      type <- switch(type, r = "00-rscript.r", sh = "01-bash.sh", 
#          out = "02-output-%A-%a.out", rds = if (missing(array_id)) "03-answer-%03i.rds" else sprintf("03-answer-%03i.rds", 
#              array_id), job = "job.rds", stop("Invalid type, the only valid types are `r`, `sh`, `out`, and `rds`.", 
#              call. = FALSE))
#      sprintf("%s/%s/%s", tmp_path, job_name, type)
#  }
#  TMP_PATH  <- "/home/george/Documents/development/slurmR"
#  JOB_NAME  <- "slurmr-job-299a16893fb7"
#  
#  # The -tcq- function is a wrapper of tryCatch that on error tries to recover
#  # the message and saves the outcome so that slurmR can return OK.
#  tcq <- function (...) 
#  {
#      ans <- tryCatch(..., error = function(e) e)
#      if (inherits(ans, "error")) {
#          ARRAY_ID. <- get("ARRAY_ID", envir = .GlobalEnv)
#          msg <- paste("An error has ocurred while evualting the expression:\n", 
#              paste(deparse(match.call()[[2]]), collapse = "\n"), 
#              "\n in ", "ARRAY_ID # ", ARRAY_ID.)
#          warning(msg, immediate. = TRUE, call. = FALSE)
#          ans$message <- paste(ans$message, msg)
#          saveRDS(ans, snames("rds", tmp_path = get("TMP_PATH", 
#              envir = .GlobalEnv), job_name = get("JOB_NAME", envir = .GlobalEnv), 
#              array_id = ARRAY_ID.))
#          q("no")
#      }
#      invisible(ans)
#  }
#  message("done loading variables and functions.")
#  tcq({
#    INDICES <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/INDICES.rds")
#  })
#  tcq({
#    X <- readRDS(sprintf("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/X_%04d.rds", ARRAY_ID))
#  })
#  tcq({
#    FUN <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/FUN.rds")
#  })
#  tcq({
#    mc.cores <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/mc.cores.rds")
#  })
#  tcq({
#    seeds <- readRDS("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/seeds.rds")
#  })
#  set.seed(seeds[ARRAY_ID], kind = NULL, normal.kind = NULL)
#  tcq({
#    ans <- parallel::mclapply(
#      X                = X,
#      FUN              = FUN,
#      mc.cores         = mc.cores
#  )
#  })
#  saveRDS(ans, sprintf("/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/03-answer-%03i.rds", ARRAY_ID), compress = TRUE)
#  --------------------------------------------------------------------------------
#  The bash file that will be used is located at: /home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/01-bash.sh and has the following contents:
#  --------------------------------------------------------------------------------
#  #!/bin/sh
#  #SBATCH --job-name=slurmr-job-299a16893fb7
#  #SBATCH --output=/home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/02-output-%A-%a.out
#  #SBATCH --array=1-2
#  #SBATCH --job-name=slurmr-job-299a16893fb7
#  #SBATCH --cpus-per-task=1
#  #SBATCH --ntasks=1
#  export OMP_NUM_THREADS=1
#  /usr/lib/R/bin/Rscript  /home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/00-rscript.r
#  --------------------------------------------------------------------------------
#  EOF
#  --------------------------------------------------------------------------------
#  Warning: [submit = FALSE] The job hasn't been submitted yet. Use sbatch() to submit the job, or you can submit it via command line using the following:
#  sbatch --job-name=slurmr-job-299a16893fb7 /home/george/Documents/development/slurmR/slurmr-job-299a16893fb7/01-bash.sh
Slurm_clean(ans) # Cleaning after you

Example 2: Job resubmission

# Submitting a simple job
job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 20, plan = "submit")

# Checking the status of the job (we can simply print)
job
status(job) # or use the state function
sacct(job) # or get more info with the sactt wrapper.

# Suppose some of the jobs are taking too long to complete (say 1, 2, and 15 through 20)
# we can stop it and resubmit the job as follows:
scancel(job)

# Resubmitting only 
sbatch(job, array = "1,2,15-20") # A new jobid will be assigned

# Once its done, we can collect all the results at once
res <- Slurm_collect(job)

# And clean up if we don't need to use it again
Slurm_clean(res)

Example 3: Using slurmR and future/doParallel/boot/…

The function makeSlurmCluster creates a PSOCK cluster within a Slurm HPC network, meaning that users can go beyond a single node cluster object and take advantage of Slurm to create a multi-node cluster object. This feature allows then using slurmR with other R packages that support working with SOCKcluster class objects. Here are some examples

library(future)
library(slurmR)

cl <- makeSlurmCluster(50)

# It only takes using a cluster plan!
plan(cluster, cl)

...your fancy futuristic code...

# Slurm Clusters are stopped in the same way any cluster object is
stopCluster(cl)

library(doParallel)
library(slurmR)

cl <- makeSlurmCluster(50)

registerDoParallel(cl)
m <- matrix(rnorm(9), 3, 3)
foreach(i=1:nrow(m), .combine=rbind) 

stopCluster(cl)

Example 4: Using slurmR directly from the command line

The slurmR package has a couple of convenient functions designed for the user to save time. First, the function sourceSlurm() allows skipping the explicit creating of a bash script file to be used together with sbatch by putting all the required config files on the first lines of an R scripts, for example:

Is an R script that on the first line coincides with that of a bash script for Slurm: #!/bin/bash. The following lines start with #SBATCH explicitly specifying options for sbatch, and the reminder lines are just R code.

The previous R script is included in the package (type system.file("example.R", package="slurmR")).

Imagine that that R script is named example.R, then you use the sourceSlurm function to submit it to Slurm as follows:

slurmR::sourceSlurm("example.R")

This will create the corresponding bash file required to be used with sbatch, and submit it to Slurm.

Another nice tool is the slurmr_cmd(). This function will create a simple bash script that can be used as a command line tool to submit this type of R scripts. Moreover, this command will can add the command to your session’s alias as follows:

library(slurmR)
slurmr_cmd("~", add_alias = TRUE)

Once that’s done, you can simply submit R scripts with “Slurm-like headers” (as shown previously) as follows:

$ slurmr example.R

There are several ways to enhance R for HPC. Depending on what are your goals/restrictions/preferences, you can use any of the following from this manually curated list:

Contributing

Package	Rerun (1)	*apply (2)	makeCluster (3)	Slurm options
slurmR	yes	yes	yes	on the fly
drake	yes	-	-	by template
rslurm	-	yes	-	on the fly
future.batchtools	-	yes	yes	by template
batchtools	yes	yes	-	by template
clustermq	-	-	-	by template

We welcome contributions to slurmR. Whether it is reporting a bug, starting a discussion by asking a question, or proposing/requesting a new feature, please go by creating a new issue here so that we can talk about it.

Please note that this project is released with a Contributor Code of Conduct (see the CODE_OF_CONDUCT.md file included in this project). By participating in this project you agree to abide by its terms.

Who uses Slurm

Funding

Computation for the work described in this paper was supported by the University of Southern California’s Center for High-Performance Computing (hpcc.usc.edu).

Institution	Country	Link
USC High Performance Computing Center	US	link
Princeton Research Computing	US	link
Harvard FAS	US	link
Harvard HMS research computing	US	link
UCSan Diego WM Keck Lab for Integrated Biology	US	link
Stanford Sherlock	US	link
Stanford SCG Informatics Cluster	US	link
Berkeley Research IT	US	link
University of Utah CHPC	US	link
University of Michigan Biostatistics cluster	US	link
The University of Kansas Center for Research Computing	US	link
University of Cambridge MRC Biostatistics Unit	UK	link
Indiana University	US	link
Caltech HPC Center	US	link
Institute for Advanced Study	US	link
UTSouthwestern Medical Center BioHPC	US	link
Vanderbilt University ACCRE	US	link
University of Virginia Research Computing	US	link
Center for Advanced Computing	CA	link
SciNet	CA	link
NLHPC	CL	link
Kultrun	CL	link
Matbio	CL	link
TIG MIT	US	link
MIT Supercloud	US	link
Oxford’s ARC	UK	link

slurmR: A Lightweight Wrapper for Slurm