parallelly/0000755000175000017500000000000014156760422012554 5ustar nileshnileshparallelly/MD50000644000175000017500000000724714156760422013076 0ustar nileshnilesh6f1edcbe0321b599c78048b3420b9870 *DESCRIPTION 1ad09b9d3d0083626a51a3c09e15b006 *NAMESPACE 6aaebdad99f426fa62e374dd49e20062 *NEWS adc585ac097cc845f1529712d40c3422 *R/as.cluster.R 4005a803d15e033b08fde9b7edaff445 *R/autoStopCluster.R 5a65ec76efae67257c1163b3270e274c *R/availableConnections.R ee186db6d0a9e81a02e80605e2ae3e44 *R/availableCores.R 0763c5b7f29cd7cad3352ce47c2fa4f1 *R/availableWorkers.R 0472b8af1e77415e5bdf2fbefd742eb3 *R/cpuLoad.R e21765070654e8e3ea512535770d5161 *R/detectCores.R 85dd5fa396c7aa56b5a801af7d1a10bd *R/freeCores.R b5e78a68734b7f2e329f4b81eff6ea4d *R/getOptionOrEnvVar.R f8f517a6163198989bd0e05bdde913ec *R/isConnectionValid.R 1a0689aa30f44dcd0446810817669cff *R/isForkedChild.R 41b7588a9d5966025db990dae7f80f71 *R/isForkedNode.R 0e79ebc76bab94e1ff5074ff6666d408 *R/isLocalhostNode.R 548a8dc8ed35cc9c9f7d5a45e7bfb0a7 *R/isNodeAlive.R c133a9e1a97c6ee6c1e9af5a6be4b427 *R/makeClusterMPI.R 3e9a37e1abce0a16bd6253412e48cdfd *R/makeClusterPSOCK.R 474138d36c93e9fd0d4448a5f28cb2e2 *R/options.R 4f5676ec90acd0cbcfea287bf90a192e *R/parallelly_disable_parallel_setup_if_needed.R 16bfecb2b41ee50c526d89faa6097f4f *R/ports.R 785ac7e21e87bfe485516be14eb397f6 *R/stealth_sample.R 0095adcd69a51cc501f662e23bd26fe9 *R/supportsMulticore.R 13c6587c4a0896bbd69e61ba00556173 *R/utils,pid.R 87cc7fb8e8f556036a60486972999ab2 *R/utils.R 679570a98c7e40b33113aa55a6ad3338 *R/zzz.R be7d2460a3d167147cda40d5a073e729 *README.md 967315ace4bd81c94fac8f5ed4333b5f *inst/WORDLIST adf704735b6e06f2bf6a550711763177 *man/as.cluster.Rd 3d2e8046a6c0af58dd75fea1ee39a336 *man/autoStopCluster.Rd 393fe1037b90c782b8ab89756cb1c1ff *man/availableConnections.Rd 869590eb229ec45b459b4861efb2c5b2 *man/availableCores.Rd 25e4548078ea04066f3367320bf51048 *man/availableWorkers.Rd 5f2c9599b2ca3506ae032f60beb1487b *man/canPortBeUsed.Rd 076c825fe6d62ff62a502b18b423deb4 *man/cpuLoad.Rd 952b59dc07b171b97d5d982924244f61 *man/figures/lifecycle-maturing-blue.svg 4ed75497d5f1f3693dd0b15c2362f9a1 *man/figures/logo.R 33baa5f0547c1321394ca0dc6d422bba *man/figures/logo.png 0d14bd9f75e75e82155accc29e12825c *man/find_rshcmd.Rd 59853d367998092d46d19f3ccb160f1d *man/freeCores.Rd e553a9eba4bdb36d8f648796639b73bc *man/freePort.Rd 55880b52432c3a2d5f8f9fabb52cfeef *man/getOption2.Rd 35fc0ecf6079373d80bd6d588a2b6a2e *man/isConnectionValid.Rd b49827a9ed2d4044bcd873ef1c9b8947 *man/isForkedChild.Rd 6b6d4c922121049b055fd0cf1635052f *man/isForkedNode.Rd f2bcc96286ae7c2dae1476817f4528e0 *man/isLocalhostNode.Rd 9ec128e4cd001e684081a20814b031a7 *man/isNodeAlive.Rd 2a6480bfcdaa83fb7723c270ce6a1254 *man/makeClusterMPI.Rd 4f0c6e9b5f46d7a670d7b5c156ca1df0 *man/makeClusterPSOCK.Rd 5cc7de8df7ddd071dca7f81229794ef0 *man/parallelly.options.Rd dfb738c2f40f0d7e15419d9d973757b2 *man/pid_exists.Rd 5d658d2e1005a5204e15fe178f178799 *man/supportsMulticore.Rd 4ee6d07f59c8dde90a4cf11ce321e363 *tests/as.cluster.R 49f1a03c81e211482c610314ae33fadd *tests/availableCores.R 66a86c117d3db4794cf2e6a14a549847 *tests/availableWorkers.R 1ed07a41cb99c91c91bed2421b20e9f0 *tests/cpuLoad.R 461019b5333a6b7bf416ab1232617f61 *tests/freeCores.R a60546115bff38885d99d6c4cc706edd *tests/freePort.R 7f61e6ef1da32131d62a4f600df663d9 *tests/incl/end.R 8146c0d4f86c47ea84c644113535eba2 *tests/incl/start,load-only.R 677ce68ae34bc1892360fea3be480ce4 *tests/incl/start.R 235d4b648fc0315a7a962c7711f1ccd9 *tests/isConnectionValid.R 3f3e452bf24321c8f6f472c5cb7efd66 *tests/isForkedChild.R ed8d703a1f3627a756bf0c9352324dfd *tests/makeClusterMPI.R 63d0a24c400d701ae4cc695fc1eb635d *tests/makeClusterPSOCK.R e8714dc1be95279112720f0d6d0414a4 *tests/options-and-envvars.R e56f280042e268b895949c00c32fb3cb *tests/r_bug18119.R 1437f4e668cd4e5475f236431e427a3a *tests/startup.R 130ae4c785ac60115ed2ddd9e7f2ea53 *tests/utils.R parallelly/DESCRIPTION0000644000175000017500000000261114156760422014262 0ustar nileshnileshPackage: parallelly Version: 1.30.0 Title: Enhancing the 'parallel' Package Imports: parallel, tools, utils Authors@R: c(person("Henrik", "Bengtsson", role=c("aut", "cre", "cph"), email = "henrikb@braju.com")) Description: Utility functions that enhance the 'parallel' package and support the built-in parallel backends of the 'future' package. For example, availableCores() gives the number of CPU cores available to your R process as given by the operating system, 'cgroups' and Linux containers, R options, and environment variables, including those set by job schedulers on high-performance compute clusters. If none is set, it will fall back to parallel::detectCores(). Another example is makeClusterPSOCK(), which is backward compatible with parallel::makePSOCKcluster() while doing a better job in setting up remote cluster workers without the need for configuring the firewall to do port-forwarding to your local computer. License: LGPL (>= 2.1) LazyLoad: TRUE ByteCompile: TRUE URL: https://parallelly.futureverse.org, https://github.com/HenrikBengtsson/parallelly BugReports: https://github.com/HenrikBengtsson/parallelly/issues RoxygenNote: 7.1.2 NeedsCompilation: no Packaged: 2021-12-16 05:47:12 UTC; hb Author: Henrik Bengtsson [aut, cre, cph] Maintainer: Henrik Bengtsson Repository: CRAN Date/Publication: 2021-12-17 01:00:02 UTC parallelly/README.md0000644000175000017500000002606214147243104014032 0ustar nileshnilesh
CRAN check status R CMD check status Top reverse-dependency checks status Coverage Status
# parallelly: Enhancing the 'parallel' Package The 'parallelly' hexlogo The **parallelly** package provides functions that enhance the **parallel** packages. For example, `availableCores()` gives the number of CPU cores available to your R process as given by R options and environment variables, including those set by job schedulers on high-performance compute (HPC) clusters. If R runs under 'cgroups' or a Linux container, then their settings are acknowledges too. If nothing else is set, the it will fall back to `parallel::detectCores()`. Another example is `makeClusterPSOCK()`, which is backward compatible with `parallel::makePSOCKcluster()` while doing a better job in setting up remote cluster workers without having to know your local public IP address and configuring the firewall to do port-forwarding to your local computer. The functions and features added to this package are written to be backward compatible with the **parallel** package, such that they may be incorporated there later. The **parallelly** package comes with an open invitation for the R Core Team to adopt all or parts of its code into the **parallel** package. ## Feature Comparison 'parallelly' vs 'parallel' | | parallelly | parallel | | ---------------------------------- | :-------------: | :--------: | | remote clusters without knowing local public IP | ✓ | N/A | | remote clusters without firewall configuration | ✓ | N/A | | remote username in ~/.ssh/config | ✓ | R (>= 4.1.0) with `user = NULL` | | set workers' library package path on startup | ✓ | N/A | | set workers' environment variables on startup | ✓ | N/A | | custom workers startup code | ✓ | N/A | | fallback to RStudio' SSH and PuTTY's plink | ✓ | N/A | | faster, parallel setup of local workers (R >= 4.0.0) | ✓ | ✓ | | faster, little-endian protocol by default | ✓ | N/A | | faster, low-latency socket connections by default | ✓ | N/A | | validation of cluster at setup | ✓ | ✓ | | attempt to launch failed workers multiple times | ✓ | N/A | | collect worker details at cluster setup | ✓ | N/A | | termination of workers if cluster setup fails | ✓ | R (>= 4.0.0) | | shutdown of cluster by garbage collector | ✓ | N/A | | combining multiple, existing clusters | ✓ | N/A | | more informative printing of cluster objects | ✓ | N/A | | defaults via options & environment variables | ✓ | N/A | | respecting CPU resources allocated by cgroups, Linux containers, and HPC schedulers | ✓ | N/A | | early error if requesting more workers than possible | ✓ | N/A | | informative error messages | ✓ | N/A | ## Compatibility with the parallel package Any cluster created by the **parallelly** package is fully compatible with the clusters created by the **parallel** package and can be used by all of **parallel**'s functions for cluster processing, e.g. `parallel::clusterEvalQ()` and `parallel::parLapply()`. The `parallelly::makeClusterPSOCK()` function can be used as a stand-in replacement of the `parallel::makePSOCKcluster()`, or equivalently, `parallel::makeCluster(..., type = "PSOCK")`. Most of **parallelly** functions apply also to clusters created by the **parallel** package. For example, ```r cl <- parallel::makeCluster(2) cl <- parallelly::autoStopCluster(cl) ``` makes the cluster created by **parallel** to shut down automatically when R's garbage collector removes the cluster object. This lowers the risk for leaving stray R worker processes running in the background by mistake. Another way to achieve the above in a single call is to use: ```r cl <- parallelly::makeClusterPSOCK(2, autoStop = TRUE) ``` ### availableCores() vs parallel::detectCores() The `availableCores()` function is designed as a better, safer alternative to `detectCores()` of the **parallel** package. It is designed to be a worry-free solution for developers and end-users to query the number of available cores - a solution that plays nice on multi-tenant systems, high-performance compute (HPC) cluster, CRAN check servers, and elsewhere. Did you know that `parallel::detectCores()` might return NA on some systems, or that `parallel::detectCores() - 1` might return 0 on some systems, e.g. old hardware and virtual machines? Because of this, you have to use `max(1, parallel::detectCores() - 1, na.rm = TRUE)` to get it correct. In contrast, `parallelly::availableCores()` is guaranteed to return a positive integer, and you can use `parallelly::availableCores(omit = 1)` to return all but one core and always at least 1. Just like other software tools that "hijacks" all cores by default, R scripts, and packages that defaults to `detectCores()` number of parallel workers cause lots of suffering for fellow end-users and system administrators. For instance, a shared server with 48 cores will come to a halt already after a few users run parallel processing using `detectCores()` number of parallel workers. This problem gets worse on machines with many cores because they can host even more concurrent users. If these R users would have used `availableCores()` instead, then the system administrator can limit the number of cores each user get to, say, 2, by setting the environment variable `R_PARALLELLY_AVAILABLECORES_FALLBACK=2`. In contrast, it is _not_ possible to override what `parallel::detectCores()` returns, cf. [PR#17641 - WISH: Make parallel::detectCores() agile to new env var R_DEFAULT_CORES ](https://bugs.r-project.org/show_bug.cgi?id=17641). At the same time, if this is on an HPC cluster with a job scheduler, a script that uses `availableCores()` will run the number of parallel workers that the job scheduler has assigned to the job. For example, if we submit a Slurm job as `sbatch --cpus-per-task=16 ...`, then `availableCores()` will return 16 because it respects the `SLURM_*` environment variables set by the scheduler. See `help("availableCores", package = "parallelly")` for currently supported job schedulers. Besides job schedulers, `availableCores()` respects R options and environment variables commonly used to specify the number of parallel workers, e.g. R option `mc.cores`. It will detect when running `R CMD check` and return 2, which is the maximum number of parallel workers allowed by the [CRAN Policies](https://cran.r-project.org/web/packages/policies.html). If nothing is set that limits the number of cores, then `availableCores()` falls back to `parallel::detectCores()` and if that returns `NA_integer_` then `1` is returned. The below table summarize the benefits: | | availableCores() | parallel::detectCores() | | --------------------------------------- | :--------------: | :---------------------------: | | Guaranteed to return a positive integer | ✓ | no (may return `NA_integer_`) | | Safely use all but some cores | ✓ | no (may return zero or less) | | Can be overridden, e.g. by a sysadm | ✓ | no | | Respects cgroups and Linux containers | ✓ | no | | Respects job scheduler allocations | ✓ | no | | Respects CRAN policies | ✓ | no | | Respects Bioconductor policies | ✓ | no | ## Backward compatibility with the future package The functions in this package originate from the **[future](https://cran.r-project.org/package=future)** package where we have used and validated them for several years. I moved these functions to this separate package, because they are also useful outside of the future framework. For backward-compatibility reasons of the future framework, the R options and environment variables that are prefixed with `parallelly.*` and `R_PARALLELLY_*` can for the time being also be set with `future.*` and `R_FUTURE_*` prefixes. ## Roadmap * [x] Submit **parallelly** to CRAN, with minimal changes compared to the corresponding functions in the **future** package (on CRAN as of 2020-10-20) * [x] Update the **future** package to import and re-export the functions from the **parallelly** to maximize backward compatibility in the future framework (**future** 1.20.1 on CRAN as of 2020-11-03) * [x] Switch to use 10-15% faster `useXDR=FALSE` * [x] Implement same fast parallel setup of parallel PSOCK workers as in **parallel** (>= 4.0.0) * [x] After having validated that there is no negative impact on the future framework, allow for changes in the **parallelly** package, e.g. renaming the R options and environment variable to be `parallelly.*` and `R_PARALLELLY_*` while falling back to `future.*` and `R_FUTURE_*` * [ ] Migrate, currently internal, UUID functions and export them, e.g. `uuid()`, `connectionUuid()`, and `sessionUuid()` (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/96). Because [R does not have a built-in md5 checksum function that operates on object](https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21), these functions require us adding a dependency on the **[digest](https://cran.r-project.org/package=digest)** package. * [ ] Add vignettes on how to set up cluster running on local or remote machines, including in Linux containers and on popular cloud services, and vignettes on common problems and how to troubleshoot them Initially, backward compatibility for the **future** package is of top priority. ## Installation R package parallelly is available on [CRAN](https://cran.r-project.org/package=parallelly) and can be installed in R as: ```r install.packages("parallelly") ``` ### Pre-release version To install the pre-release version that is available in Git branch `develop` on GitHub, use: ```r remotes::install_github("HenrikBengtsson/parallelly", ref="develop") ``` This will install the package from source. parallelly/man/0000755000175000017500000000000014146362545013332 5ustar nileshnileshparallelly/man/isNodeAlive.Rd0000644000175000017500000000273514116406533016023 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isNodeAlive.R \name{isNodeAlive} \alias{isNodeAlive} \title{Check whether or not the cluster nodes are alive} \usage{ isNodeAlive(x, ...) } \arguments{ \item{x}{A cluster or a cluster node ("worker").} \item{...}{Not used.} } \value{ A logical vector of length \code{length(x)} with values FALSE, TRUE, and NA. If it can be established that the process for a cluster node is running, then TRUE is returned. If it does not run, then FALSE is returned. If neither can be inferred, for instance because the worker runs on a remote machine, then NA is returned. } \description{ Check whether or not the cluster nodes are alive } \details{ This function works by checking whether the cluster node process is running or not. This is done by querying the system for its process ID (PID), which is registered by \code{\link[=makeClusterPSOCK]{makeClusterPSOCK()}} when the node starts. If the PID is not known, the NA is returned. On Unix and macOS, the PID is queried using \code{\link[tools:pskill]{tools::pskill()}} with fallback to \code{system("ps")}. On MS Windows, \code{system2("tasklist")} is used, which may take a long time if there are a lot of processes running. For details, see the \emph{internal} \code{\link[=pid_exists]{pid_exists()}} function. } \examples{ \donttest{ cl <- makeClusterPSOCK(2) ## Check if cluster nodes #2 is alive print(isNodeAlive(cl[[2]])) ## Check all nodes print(isNodeAlive(cl)) } } parallelly/man/supportsMulticore.Rd0000644000175000017500000000514514043061205017372 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/supportsMulticore.R \name{supportsMulticore} \alias{supportsMulticore} \title{Check If Forked Processing ("multicore") is Supported} \usage{ supportsMulticore(...) } \arguments{ \item{\dots}{Internal usage only.} } \value{ TRUE if forked processing is supported and not disabled, otherwise FALSE. } \description{ Certain parallelization methods in R rely on \emph{forked} processing, e.g. \code{parallel::mclapply()}, \code{parallel::makeCluster(n, type = "FORK")}, \code{doMC::registerDoMC()}, and \code{future::plan("multicore")}. Process forking is done by the operating system and support for it in \R is restricted to Unix-like operating systems such as Linux, Solaris, and macOS. R running on Microsoft Windows does not support forked processing. In R, forked processing is often referred to as "multicore" processing, which stems from the 'mc' of the \code{mclapply()} family of functions, which originally was in a package named \pkg{multicore} which later was incorporated into the \pkg{parallel} package. This function checks whether or not forked (aka "multicore") processing is supported in the current \R session. } \section{Support for process forking}{ While R supports forked processing on Unix-like operating system such as Linux and macOS, it does not on the Microsoft Windows operating system. For some R environments it is considered unstable to perform parallel processing based on \emph{forking}. This is for example the case when using RStudio, cf. \href{https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011}{RStudio Inc. recommends against using forked processing when running R from within the RStudio software}. This function detects when running in such an environment and returns \code{FALSE}, despite the underlying operating system supports forked processing. A warning will also be produced informing the user about this the first time time this function is called in an \R session. This warning can be disabled by setting R option \option{parallelly.supportsMulticore.unstable}, or environment variable \env{R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE} to \code{"quiet"}. } \section{Enable or disable forked processing}{ It is possible to disable forked processing for futures by setting \R option \option{parallelly.fork.enable} to \code{FALSE}. Alternatively, one can set environment variable \env{R_PARALLELLY_FORK_ENABLE} to \code{false}. Analogously, it is possible to override disabled forking by setting one of these to \code{TRUE}. } \examples{ ## Check whether or not forked processing is supported supportsMulticore() } parallelly/man/isForkedChild.Rd0000644000175000017500000000114414146362545016333 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isForkedChild.R \name{isForkedChild} \alias{isForkedChild} \title{Checks whether or not we are running in a forked child process} \usage{ isForkedChild() } \value{ (logical) Returns TRUE if the running in a forked child process, otherwise FALSE. } \description{ Checks whether or not we are running in a forked child process } \details{ Examples of setups and functions that rely on \emph{forked} parallelization are \code{parallel::makeCluster(n, type = "FORK")}, \code{parallel::mclapply()}, and \code{future::plan("multicore")}. } parallelly/man/find_rshcmd.Rd0000644000175000017500000000215514043061213016064 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/makeClusterPSOCK.R \name{find_rshcmd} \alias{find_rshcmd} \title{Search for SSH clients on the current system} \usage{ find_rshcmd(which = NULL, first = FALSE, must_work = TRUE) } \arguments{ \item{which}{A character vector specifying which types of SSH clients to search for. If NULL, a default set of clients supported by the current platform is searched for.} \item{first}{If TRUE, the first client found is returned, otherwise all located clients are returned.} \item{must_work}{If TRUE and no clients was found, then an error is produced, otherwise only a warning.} } \value{ A named list of pathnames to all located SSH clients. The pathnames may be followed by zero or more command-line options, i.e. the elements of the returned list are character vectors of length one or more. If \code{first = TRUE}, only the first one is returned. Attribute \code{version} contains the output from querying the executable for its version (via command-line option \code{-V}). } \description{ Search for SSH clients on the current system } \keyword{internal} parallelly/man/makeClusterPSOCK.Rd0000644000175000017500000006727314156452603016713 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/makeClusterPSOCK.R \name{makeClusterPSOCK} \alias{makeClusterPSOCK} \alias{makeNodePSOCK} \title{Create a PSOCK Cluster of R Workers for Parallel Processing} \usage{ makeClusterPSOCK( workers, makeNode = makeNodePSOCK, port = c("auto", "random"), ..., autoStop = FALSE, tries = getOption2("parallelly.makeNodePSOCK.tries", 3L), delay = getOption2("parallelly.makeNodePSOCK.tries.delay", 15), validate = getOption2("parallelly.makeNodePSOCK.validate", TRUE), verbose = getOption2("parallelly.debug", FALSE) ) makeNodePSOCK( worker = getOption2("parallelly.localhost.hostname", "localhost"), master = NULL, port, connectTimeout = getOption2("parallelly.makeNodePSOCK.connectTimeout", 2 * 60), timeout = getOption2("parallelly.makeNodePSOCK.timeout", 30 * 24 * 60 * 60), rscript = NULL, homogeneous = NULL, rscript_args = NULL, rscript_envs = NULL, rscript_libs = NULL, rscript_startup = NULL, rscript_sh = c("auto", "cmd", "sh"), default_packages = c("datasets", "utils", "grDevices", "graphics", "stats", if (methods) "methods"), methods = TRUE, socketOptions = getOption2("parallelly.makeNodePSOCK.socketOptions", "no-delay"), useXDR = getOption2("parallelly.makeNodePSOCK.useXDR", FALSE), outfile = "/dev/null", renice = NA_integer_, rshcmd = getOption2("parallelly.makeNodePSOCK.rshcmd", NULL), user = NULL, revtunnel = TRUE, rshlogfile = NULL, rshopts = getOption2("parallelly.makeNodePSOCK.rshopts", NULL), rank = 1L, manual = FALSE, dryrun = FALSE, quiet = FALSE, setup_strategy = getOption2("parallelly.makeNodePSOCK.setup_strategy", "parallel"), action = c("launch", "options"), verbose = FALSE ) } \arguments{ \item{workers}{The hostnames of workers (as a character vector) or the number of localhost workers (as a positive integer).} \item{makeNode}{A function that creates a \code{"SOCKnode"} or \code{"SOCK0node"} object, which represents a connection to a worker.} \item{port}{The port number of the master used for communicating with all the workers (via socket connections). If an integer vector of ports, then a random one among those is chosen. If \code{"random"}, then a random port in is chosen from \code{11000:11999}, or from the range specified by environment variable \env{R_PARALLELLY_RANDOM_PORTS}. If \code{"auto"} (default), then the default (single) port is taken from environment variable \env{R_PARALLEL_PORT}, otherwise \code{"random"} is used. \emph{Note, do not use this argument to specify the port number used by \code{rshcmd}, which typically is an SSH client. Instead, if the SSH daemon runs on a different port than the default 22, specify the SSH port by appending it to the hostname, e.g. \code{"remote.server.org:2200"} or via SSH options \code{-p}, e.g. \code{rshopts = c("-p", "2200")}.}} \item{\dots}{Optional arguments passed to \code{makeNode(workers[i], ..., rank = i)} where \code{i = seq_along(workers)}.} \item{autoStop}{If TRUE, the cluster will be automatically stopped using \code{\link[parallel:makeCluster]{stopCluster}()} when it is garbage collected, unless already stopped. See also \code{\link[=autoStopCluster]{autoStopCluster()}}.} \item{tries, delay}{Maximum number of attempts done to launch each node with \code{makeNode()} and the delay (in seconds) in-between attempts. If argument \code{port} specifies more than one port, e.g. \code{port = "random"} then a random port will be drawn and validated at most \code{tries} times. Arguments \code{tries} and \code{delay} are used only when \verb{setup_strategy == "sequential}.} \item{validate}{If TRUE (default), after the nodes have been created, they are all validated that they work by inquiring about their session information, which is saved in attribute \code{session_info} of each node.} \item{verbose}{If TRUE, informative messages are outputted.} \item{worker}{The hostname or IP number of the machine where the worker should run.} \item{master}{The hostname or IP number of the master / calling machine, as known to the workers. If NULL (default), then the default is \code{Sys.info()[["nodename"]]} unless \code{worker} is \emph{localhost} or \code{revtunnel = TRUE} in case it is \code{"localhost"}.} \item{connectTimeout}{The maximum time (in seconds) allowed for each socket connection between the master and a worker to be established (defaults to 2 minutes). \emph{See note below on current lack of support on Linux and macOS systems.}} \item{timeout}{The maximum time (in seconds) allowed to pass without the master and a worker communicate with each other (defaults to 30 days).} \item{rscript, homogeneous}{The system command for launching \command{Rscript} on the worker and whether it is installed in the same path as the calling machine or not. For more details, see below.} \item{rscript_args}{Additional arguments to \command{Rscript} (as a character vector). This argument can be used to customize the \R environment of the workers before they launches. For instance, use \code{rscript_args = c("-e", shQuote('setwd("/path/to")'))} to set the working directory to \file{/path/to} on \emph{all} workers.} \item{rscript_envs}{A named character vector environment variables to set or unset on worker at startup, e.g. \code{rscript_envs = c(FOO = "3.14", "HOME", "UNKNOWN", UNSETME = NA_character_)}. If an element is not named, then the value of that variable will be used as the name and the value will be the value of \code{Sys.getenv()} for that variable. Non-existing environment variables will be dropped. These variables are set using \code{Sys.setenv()}. An named element with value \code{NA_character_} will cause that variable to be unset, which is done via \code{Sys.unsetenv()}.} \item{rscript_libs}{A character vector of \R library paths that will be used for the library search path of the \R workers. An asterisk (\code{"*"}) will be resolved to the default \code{.libPaths()} \emph{on the worker}. That is, to \code{prepend} a folder, instead of replacing the existing ones, use \code{rscript_libs = c("new_folder", "*")}. To pass down a non-default library path currently set \emph{on the main \R session} to the workers, use \code{rscript_libs = .libPaths()}.} \item{rscript_startup}{An \R expression or a character vector of \R code, or a list with a mix of these, that will be evaluated on the \R worker prior to launching the worker's event loop. For instance, use \code{rscript_startup = 'setwd("/path/to")'} to set the working directory to \file{/path/to} on \emph{all} workers.} \item{rscript_sh}{The type of shell used where \code{rscript} is launched, which should be \code{"sh"} is launched via a POSIX shell and \code{"cmd"} if launched via an MS Windows shell. This controls how shell command-line options are quoted, via \code{\link[base:shQuote]{shQuote(..., type = rscript_sh)}}. If \code{"auto"} (default), and the cluster node is launched locally, then it is set to \code{"sh"} or \code{"cmd"} according to the current platform. If launched remotely, then it is set to \code{"sh"} based on the assumption remote machines typically launch commands via SSH in a POSIX shell.} \item{default_packages}{A character vector or NULL that controls which R packages are attached on each cluster node during startup. An asterisk (\code{"*"}) resolves to \code{getOption("defaultPackages")} \emph{on the current machine}. If NULL, then the default set of packages R are attached.} \item{methods}{If TRUE (default), then the \pkg{methods} package is also loaded. This is argument exists for legacy reasons due to how \command{Rscript} worked in R (< 3.5.0).} \item{socketOptions}{A character string that sets \R option \option{socketOptions} on the worker.} \item{useXDR}{If FALSE (default), the communication between master and workers, which is binary, will use small-endian (faster), otherwise big-endian ("XDR"; slower).} \item{outfile}{Where to direct the \link[base:showConnections]{stdout} and \link[base:showConnections]{stderr} connection output from the workers. If NULL, then no redirection of output is done, which means that the output is relayed in the terminal on the local computer. On Windows, the output is only relayed when running \R from a terminal but not from a GUI.} \item{renice}{A numerical 'niceness' (priority) to set for the worker processes.} \item{rshcmd, rshopts}{The command (character vector) to be run on the master to launch a process on another host and any additional arguments (character vector). These arguments are only applied if \code{machine} is not \emph{localhost}. For more details, see below.} \item{user}{(optional) The user name to be used when communicating with another host.} \item{revtunnel}{If TRUE, a reverse SSH tunnel is set up for each worker such that the worker \R process sets up a socket connection to its local port \code{(port - rank + 1)} which then reaches the master on port \code{port}. If FALSE, then the worker will try to connect directly to port \code{port} on \code{master}. For more details, see below.} \item{rshlogfile}{(optional) If a filename, the output produced by the \code{rshcmd} call is logged to this file, of if TRUE, then it is logged to a temporary file. The log file name is available as an attribute as part of the return node object. \emph{Warning: This only works with SSH clients that support option \verb{-E out.log}}. For example, PuTTY's \command{plink} does \emph{not} support this option, and any attempts to specify \code{rshlogfile} will cause the SSH connection to fail.} \item{rank}{A unique one-based index for each worker (automatically set).} \item{manual}{If TRUE the workers will need to be run manually. The command to run will be displayed.} \item{dryrun}{If TRUE, nothing is set up, but a message suggesting how to launch the worker from the terminal is outputted. This is useful for troubleshooting.} \item{quiet}{If TRUE, then no output will be produced other than that from using \code{verbose = TRUE}.} \item{setup_strategy}{If \code{"parallel"} (default), the workers are set up concurrently, one after the other. If \code{"sequential"}, they are set up sequentially.} \item{action}{This is an internal argument.} } \value{ An object of class \code{c("RichSOCKcluster", "SOCKcluster", "cluster")} consisting of a list of \code{"SOCKnode"} or \code{"SOCK0node"} workers (that also inherit from \code{RichSOCKnode}). \code{makeNodePSOCK()} returns a \code{"SOCKnode"} or \code{"SOCK0node"} object representing an established connection to a worker. } \description{ The \code{makeClusterPSOCK()} function creates a cluster of \R workers for parallel processing. These \R workers may be background \R sessions on the current machine, \R sessions on external machines (local or remote), or a mix of such. For external workers, the default is to use SSH to connect to those external machines. This function works similarly to \code{\link[parallel:makeCluster]{makePSOCKcluster}()} of the \pkg{parallel} package, but provides additional and more flexibility options for controlling the setup of the system calls that launch the background \R workers, and how to connect to external machines. } \section{Definition of \emph{localhost}}{ A hostname is considered to be \emph{localhost} if it equals: \itemize{ \item \code{"localhost"}, \item \code{"127.0.0.1"}, or \item \code{Sys.info()[["nodename"]]}. } It is also considered \emph{localhost} if it appears on the same line as the value of \code{Sys.info()[["nodename"]]} in file \file{/etc/hosts}. } \section{Default SSH client and options (arguments \code{rshcmd} and \code{rshopts})}{ Arguments \code{rshcmd} and \code{rshopts} are only used when connecting to an external host. The default method for connecting to an external host is via SSH and the system executable for this is given by argument \code{rshcmd}. The default is given by option \option{parallelly.makeNodePSOCK.rshcmd}. If that is not set, then the default is to use \command{ssh} on Unix-like systems, including macOS as well as Windows 10. On older MS Windows versions, which does not have a built-in \command{ssh} client, the default is to use (i) \command{plink} from the \href{https://www.putty.org/}{\command{PuTTY}} project, and then (ii) the \command{ssh} client that is distributed with RStudio. PuTTY puts itself on Windows' system \env{PATH} when installed, meaning this function will find PuTTY automatically if installed. If not, to manually set specify PuTTY as the SSH client, specify the absolute pathname of \file{plink.exe} in the first element and option \command{-ssh} in the second as in \code{rshcmd = c("C:/Path/PuTTY/plink.exe", "-ssh")}. This is because all elements of \code{rshcmd} are individually "shell" quoted and element \code{rshcmd[1]} must be on the system \env{PATH}. Furthermore, when running \R from RStudio on Windows, the \command{ssh} client that is distributed with RStudio will also be considered. This client, which is from \href{https://osdn.net/projects/mingw/}{MinGW} MSYS, is searched for in the folder given by the \env{RSTUDIO_MSYS_SSH} environment variable - a variable that is (only) set when running RStudio. To use this SSH client outside of RStudio, set \env{RSTUDIO_MSYS_SSH} accordingly. You can override the default set of SSH clients that are searched for by specifying them in argument \code{rshcmd} or via option \option{parallelly.makeNodePSOCK.rshcmd} using the format \verb{<...>}, e.g. \code{rshcmd = c("", "", "")}. See below for examples. If no SSH-client is found, an informative error message is produced. Additional SSH options may be specified via argument \code{rshopts}, which defaults to option \option{parallelly.makeNodePSOCK.rshopts}. For instance, a private SSH key can be provided as \code{rshopts = c("-i", "~/.ssh/my_private_key")}. PuTTY users should specify a PuTTY PPK file, e.g. \code{rshopts = c("-i", "C:/Users/joe/.ssh/my_keys.ppk")}. Contrary to \code{rshcmd}, elements of \code{rshopts} are not quoted. } \section{Accessing external machines that prompts for a password}{ \emph{IMPORTANT: With one exception, it is not possible to for these functions to log in and launch \R workers on external machines that requires a password to be entered manually for authentication.} The only known exception is the PuTTY client on Windows for which one can pass the password via command-line option \code{-pw}, e.g. \code{rshopts = c("-pw", "MySecretPassword")}. Note, depending on whether you run \R in a terminal or via a GUI, you might not even see the password prompt. It is also likely that you cannot enter a password, because the connection is set up via a background system call. The poor man's workaround for setup that requires a password is to manually log into the each of the external machines and launch the \R workers by hand. For this approach, use \code{manual = TRUE} and follow the instructions which include cut'n'pasteable commands on how to launch the worker from the external machine. However, a much more convenient and less tedious method is to set up key-based SSH authentication between your local machine and the external machine(s), as explain below. } \section{Accessing external machines via key-based SSH authentication}{ The best approach to automatically launch \R workers on external machines over SSH is to set up key-based SSH authentication. This will allow you to log into the external machine without have to enter a password. Key-based SSH authentication is taken care of by the SSH client and not \R. To configure this, see the manuals of your SSH client or search the web for "ssh key authentication". } \section{Reverse SSH tunneling}{ The default is to use reverse SSH tunneling (\code{revtunnel = TRUE}) for workers running on other machines. This avoids the complication of otherwise having to configure port forwarding in firewalls, which often requires static IP address as well as privileges to edit the firewall on your outgoing router, something most users don't have. It also has the advantage of not having to know the internal and / or the public IP address / hostname of the master. Yet another advantage is that there will be no need for a DNS lookup by the worker machines to the master, which may not be configured or is disabled on some systems, e.g. compute clusters. } \section{Argument \code{rscript}}{ If \code{homogeneous} is FALSE, the \code{rscript} defaults to \code{"Rscript"}, i.e. it is assumed that the \command{Rscript} executable is available on the \env{PATH} of the worker. If \code{homogeneous} is TRUE, the \code{rscript} defaults to \code{file.path(R.home("bin"), "Rscript")}, i.e. it is basically assumed that the worker and the caller share the same file system and \R installation. When specified, argument \code{rscript} should be a character vector with one or more elements. Any asterisk (\code{"*"}) will be resolved to the above default \code{homogeneous}-dependent \code{Rscript} path. All elements are automatically shell quoted using \code{\link[base:shQuote]{base::shQuote()}}, except those that are of format \verb{=}, that is, the ones matching the regular expression '\samp{^[[:alpha:]_][[:alnum:]_]*=.*}'. Another exception is when \code{rscript} inherits from 'AsIs'. } \section{Default value of argument \code{homogeneous}}{ The default value of \code{homogeneous} is TRUE if and only if either of the following is fulfilled: \itemize{ \item \code{worker} is \emph{localhost} \item \code{revtunnel} is FALSE and \code{master} is \emph{localhost} \item \code{worker} is neither an IP number nor a fully qualified domain name (FQDN). A hostname is considered to be a FQDN if it contains one or more periods } In all other cases, \code{homogeneous} defaults to FALSE. } \section{Connection time out}{ Argument \code{connectTimeout} does \emph{not} work properly on Unix and macOS due to limitation in \R itself. For more details on this, please see R-devel thread 'BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)' on 2016-10-26 (\url{https://stat.ethz.ch/pipermail/r-devel/2016-October/073309.html}). When used, the timeout will eventually trigger an error, but it won't happen until the socket connection timeout \code{timeout} itself happens. } \section{Communication time out}{ If there is no communication between the master and a worker within the \code{timeout} limit, then the corresponding socket connection will be closed automatically. This will eventually result in an error in code trying to access the connection. } \section{Failing to set up local workers}{ When setting up a cluster of localhost workers, that is, workers running on the same machine as the master \R process, occasionally a connection to a worker ("cluster node") may fail to be set up. When this occurs, an informative error message with troubleshooting suggestions will be produced. The most common reason for such localhost failures is due to port clashes. Retrying will often resolve the problem. } \section{Failing to set up remote workers}{ A cluster of remote workers runs \R processes on external machines. These external \R processes are launched over, typically, SSH to the remote machine. For this to work, each of the remote machines needs to have \R installed, which preferably is of the same version as what is on the main machine. For this to work, it is required that one can SSH to the remote machines. Ideally, the SSH connections use authentication based on public-private SSH keys such that the set up of the remote workers can be fully automated (see above). If \code{makeClusterPSOCK()} fails to set up one or more remote \R workers, then an informative error message is produced. There are a few reasons for failing to set up remote workers. If this happens, start by asserting that you can SSH to the remote machine and launch \file{Rscript} by calling something like: \preformatted{ {local}$ ssh -l alice remote.server.org {remote}$ Rscript --version R scripting front-end version 3.6.1 (2019-07-05) {remote}$ logout {local}$ } When you have confirmed the above to work, then confirm that you can achieve the same in a single command-line call; \preformatted{ {local}$ ssh -l alice remote.server.org Rscript --version R scripting front-end version 3.6.1 (2019-07-05) {local}$ } The latter will assert that you have proper startup configuration also for \emph{non-interactive} shell sessions on the remote machine. Another reason for failing to setup remote workers could be that they are running an \R version that is not compatible with the version that your main \R session is running. For instance, if we run R (>= 3.6.0) locally and the workers run R (< 3.5.0), we will get: \verb{Error in unserialize(node$con) : error reading from connection}. This is because R (>= 3.6.0) uses serialization format version 3 by default whereas R (< 3.5.0) only supports version 2. We can see the version of the \R workers by adding \code{rscript_args = c("-e", shQuote("getRversion()"))} when calling \code{makeClusterPSOCK()}. } \examples{ ## NOTE: Drop 'dryrun = TRUE' below in order to actually connect. Add ## 'verbose = TRUE' if you run into problems and need to troubleshoot. ## EXAMPLE: Two workers on the local machine workers <- c("localhost", "localhost") cl <- makeClusterPSOCK(workers, dryrun = TRUE, quiet = TRUE) ## EXAMPLE: Three remote workers ## Setup of three R workers on two remote machines are set up workers <- c("n1.remote.org", "n2.remote.org", "n1.remote.org") cl <- makeClusterPSOCK(workers, dryrun = TRUE, quiet = TRUE) ## EXAMPLE: Local and remote workers ## Same setup when the two machines are on the local network and ## have identical software setups cl <- makeClusterPSOCK( workers, revtunnel = FALSE, homogeneous = TRUE, dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Three remote workers 'n1', 'n2', and 'n3' that can only be ## accessed via jumphost 'login.remote.org' workers <- c("n1", "n2", "n1") cl <- makeClusterPSOCK( workers, rshopts = c("-J", "login.remote.org"), homogeneous = FALSE, dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Remote workers with specific setup ## Setup of remote worker with more detailed control on ## authentication and reverse SSH tunnelling cl <- makeClusterPSOCK( "remote.server.org", user = "johnny", ## Manual configuration of reverse SSH tunnelling revtunnel = FALSE, rshopts = c("-v", "-R 11000:gateway:11942"), master = "gateway", port = 11942, ## Run Rscript nicely and skip any startup scripts rscript = c("nice", "/path/to/Rscript"), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Two workers running in Docker on the local machine ## Setup of 2 Docker workers running rocker/r-parallel cl <- makeClusterPSOCK( rep("localhost", times = 2L), ## Launch Rscript inside Docker container rscript = c( "docker", "run", "--net=host", "rocker/r-parallel", "Rscript" ), ## IMPORTANT: Because Docker runs inside a virtual machine (VM) on macOS ## and Windows (not Linux), when the R worker tries to connect back to ## the default 'localhost' it will fail, because the main R session is ## not running in the VM, but outside on the host. To reach the host on ## macOS and Windows, make sure to use master = "host.docker.internal" # master = "host.docker.internal", # <= macOS & Windows dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Two workers running in Singularity on the local machine ## Setup of 2 Singularity workers running rocker/r-parallel cl <- makeClusterPSOCK( rep("localhost", times = 2L), ## Launch Rscript inside Linux container rscript = c( "singularity", "exec", "docker://rocker/r-parallel", "Rscript" ), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: One worker running in udocker on the local machine ## Setup of a single udocker.py worker running rocker/r-parallel cl <- makeClusterPSOCK( "localhost", ## Launch Rscript inside Docker container (using udocker) rscript = c( "udocker.py", "run", "rocker/r-parallel", "Rscript" ), ## Manually launch parallel workers ## (need double shQuote():s because udocker.py drops one level) rscript_args = c( "-e", shQuote(shQuote("parallel:::.workRSOCK()")) ), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: One worker running in Wine for Linux on the local machine ## To install R for MS Windows in Wine, do something like: ## wget https://cran.r-project.org/bin/windows/base/R-4.1.2-win.exe ## wine R-4.1.2-win.exe /SILENT ## winecfg # In GUI, set 'Windows version' to 'Windows 10' ## wine "C:/Program Files/R/R-4.1.2/bin/x64/Rscript.exe" --version cl <- makeClusterPSOCK(1L, rscript = c( "WINEDEBUG=fixme-all", "wine", "C:/Program Files/R/R-4.1.2/bin/x64/Rscript.exe" ), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Launch 124 workers on MS Windows 10, where half are ## running on CPU Group #0 and half on CPU Group #1. ## (https://lovickconsulting.com/2021/11/18/ ## running-r-clusters-on-an-amd-threadripper-3990x-in-windows-10-2/) ncores <- 124 cpu_groups <- c(0, 1) cl <- lapply(cpu_groups, FUN = function(cpu_group) { parallelly::makeClusterPSOCK(ncores \%/\% length(cpu_groups), rscript = I(c( Sys.getenv("COMSPEC"), "/c", "start", "/B", "/NODE", cpu_group, "/AFFINITY", "0xFFFFFFFFFFFFFFFE", "*" )), dryrun = TRUE, quiet = TRUE ) }) ## merge the two 62-node clusters into one with 124 nodes cl <- do.call(c, cl) ## EXAMPLE: Remote worker running on AWS ## Launching worker on Amazon AWS EC2 running one of the ## Amazon Machine Images (AMI) provided by RStudio ## (https://www.louisaslett.com/RStudio_AMI/) public_ip <- "1.2.3.4" ssh_private_key_file <- "~/.ssh/my-private-aws-key.pem" cl <- makeClusterPSOCK( ## Public IP number of EC2 instance public_ip, ## User name (always 'ubuntu') user = "ubuntu", ## Use private SSH key registered with AWS rshopts = c( "-o", "StrictHostKeyChecking=no", "-o", "IdentitiesOnly=yes", "-i", ssh_private_key_file ), ## Set up .libPaths() for the 'ubuntu' user ## and then install the future package rscript_startup = quote(local({ p <- Sys.getenv("R_LIBS_USER") dir.create(p, recursive = TRUE, showWarnings = FALSE) .libPaths(p) install.packages("future") })), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Remote worker running on GCE ## Launching worker on Google Cloud Engine (GCE) running a ## container based VM (with a #cloud-config specification) public_ip <- "1.2.3.4" user <- "johnny" ssh_private_key_file <- "~/.ssh/google_compute_engine" cl <- makeClusterPSOCK( ## Public IP number of GCE instance public_ip, ## User name (== SSH key label (sic!)) user = user, ## Use private SSH key registered with GCE rshopts = c( "-o", "StrictHostKeyChecking=no", "-o", "IdentitiesOnly=yes", "-i", ssh_private_key_file ), ## Launch Rscript inside Docker container rscript = c( "docker", "run", "--net=host", "rocker/r-parallel", "Rscript" ), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Remote worker running on Linux from Windows machine ## Connect to remote Unix machine 'remote.server.org' on port 2200 ## as user 'bob' from a Windows machine with PuTTY installed. ## Using the explicit special rshcmd = "", will force ## makeClusterPSOCK() to search for and use the PuTTY plink software, ## preventing it from using other SSH clients on the system search PATH. cl <- makeClusterPSOCK( "remote.server.org", user = "bob", rshcmd = "", rshopts = c("-P", 2200, "-i", "C:/Users/bobby/.ssh/putty.ppk"), dryrun = TRUE, quiet = TRUE ) ## EXAMPLE: Remote worker running on Linux from RStudio on Windows ## Connect to remote Unix machine 'remote.server.org' on port 2200 ## as user 'bob' from a Windows machine via RStudio's SSH client. ## Using the explicit special rshcmd = "", will force ## makeClusterPSOCK() to use the SSH client that comes with RStudio, ## preventing it from using other SSH clients on the system search PATH. cl <- makeClusterPSOCK( "remote.server.org", user = "bob", rshcmd = "", dryrun = TRUE, quiet = TRUE ) } parallelly/man/freePort.Rd0000644000175000017500000000234014060224156015374 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ports.R \name{freePort} \alias{freePort} \title{Find a TCP port that can be opened} \usage{ freePort(ports = 1024:65535, default = "first", randomize = TRUE) } \arguments{ \item{ports}{(integer vector, or character string) Zero or more TCP ports in [0, 65535] to scan. If \code{"random"}, then a random set of ports is considered. If \code{"auto"}, then the port given by environment variable \env{R_PARALLEL_PORT} is used, which may also specify \code{random}.} \item{default}{(integer) \code{NA_integer_} or a port to returned if an available port could not be found. If \code{"first"}, then \code{ports[1]}. If \code{"random"}, then a random port among \code{ports} is used. If \code{length(ports) == 0}, then \code{NA_integer_}.} \item{randomize}{(logical) If TRUE, \code{ports} is randomly shuffled before searched. This shuffle does \emph{not} forward the RNG seed.} } \value{ Returns an integer representing the first port among \code{ports} that can be opened. If none can be opened, then \code{default} is returned. If port querying is not supported, as in R (< 4.0.0), then \code{default} is returned. } \description{ Find a TCP port that can be opened } parallelly/man/as.cluster.Rd0000644000175000017500000000226214147107254015701 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/as.cluster.R \name{as.cluster} \alias{as.cluster} \alias{as.cluster.cluster} \alias{as.cluster.list} \alias{as.cluster.SOCKnode} \alias{as.cluster.SOCK0node} \alias{c.cluster} \title{Coerce an Object to a Cluster Object} \usage{ as.cluster(x, ...) \method{as.cluster}{cluster}(x, ...) \method{as.cluster}{list}(x, ...) \method{as.cluster}{SOCKnode}(x, ...) \method{as.cluster}{SOCK0node}(x, ...) \method{c}{cluster}(..., recursive = FALSE) } \arguments{ \item{x}{An object to be coerced.} \item{\dots}{Additional arguments passed to the underlying coercion method. For \code{c(...)}, the clusters and cluster nodes to be combined.} \item{recursive}{Not used.} } \value{ An object of class \code{cluster}. \code{c(...)} combine multiple clusters and / or cluster nodes into one cluster returned as an of class \code{cluster}. A warning will be produced if there are duplicated nodes in the resulting cluster. } \description{ Coerce an Object to a Cluster Object } \examples{ cl1 <- makeClusterPSOCK(2, dryrun = TRUE) cl2 <- makeClusterPSOCK(c("n1", "server.remote.org"), dryrun = TRUE) cl <- c(cl1, cl2) print(cl) } parallelly/man/makeClusterMPI.Rd0000644000175000017500000000404214064253623016441 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/makeClusterMPI.R \name{makeClusterMPI} \alias{makeClusterMPI} \title{Create a Message Passing Interface (MPI) Cluster of R Workers for Parallel Processing} \usage{ makeClusterMPI( workers, ..., autoStop = FALSE, verbose = getOption2("parallelly.debug", FALSE) ) } \arguments{ \item{workers}{The number workers (as a positive integer).} \item{\dots}{Optional arguments passed to \code{\link[parallel:makeCluster]{makeCluster}(workers, type = "MPI", ...)}.} \item{autoStop}{If TRUE, the cluster will be automatically stopped using \code{\link[parallel:makeCluster]{stopCluster}()} when it is garbage collected, unless already stopped. See also \code{\link[=autoStopCluster]{autoStopCluster()}}.} \item{verbose}{If TRUE, informative messages are outputted.} } \value{ An object of class \code{c("RichMPIcluster", "MPIcluster", "cluster")} consisting of a list of \code{"MPInode"} workers. } \description{ The \code{makeClusterMPI()} function creates an MPI cluster of \R workers for parallel processing. This function utilizes \code{makeCluster(..., type = "MPI")} of the \pkg{parallel} package and tweaks the cluster in an attempt to avoid \code{\link[parallel:makeCluster]{stopCluster()}} from hanging (1). \emph{WARNING: This function is very much in a beta version and should only be used if \code{parallel::makeCluster(..., type = "MPI")} fails.} } \details{ \emph{Creating MPI clusters requires that the \pkg{Rmpi} and \pkg{snow} packages are installed.} } \examples{ \donttest{\dontrun{ if (requireNamespace("Rmpi") && requireNamespace("snow")) { cl <- makeClusterMPI(2, autoStop = TRUE) print(cl) y <- parLapply(cl, X = 1:3, fun = sqrt) print(y) rm(list = "cl") } }} } \references{ \enumerate{ \item R-sig-hpc thread \href{https://stat.ethz.ch/pipermail/r-sig-hpc/2017-September/002065.html}{Rmpi: mpi.close.Rslaves() 'hangs'} on 2017-09-28. } } \seealso{ \code{\link[=makeClusterPSOCK]{makeClusterPSOCK()}} and \code{\link[parallel:makeCluster]{parallel::makeCluster()}}. } parallelly/man/freeCores.Rd0000644000175000017500000000233214043061205015517 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/freeCores.R \name{freeCores} \alias{freeCores} \title{Get the Average Number of Free CPU Cores} \usage{ freeCores( memory = c("5min", "15min", "1min"), fraction = 0.9, logical = getOption2("parallelly.availableCores.logical", TRUE), default = parallelly::availableCores() ) } \arguments{ \item{memory}{(character) The time period used to infer the system load, with alternatives being 5 minutes (default), 15 minutes, or 1 minute.} \item{fraction}{(non-negative numeric) A scale factor.} \item{logical}{Passed as-is to \code{\link[=availableCores]{availableCores()}}.} \item{default}{(integer) The value to be returned if the system load is unknown, i.e. \code{\link[=cpuLoad]{cpuLoad()}} return missing values.} } \value{ An positive integer with attributes \code{loadavg} (named numeric), \code{maxCores} (named integer), argument \code{memory} (character), and argument \code{fraction} (numeric). } \description{ Get the Average Number of Free CPU Cores } \examples{ free <- freeCores() print(free) \dontrun{ ## Make availableCores() agile to the system load options(parallelly.availableCores.custom = function() freeCores()) } } \keyword{internal} parallelly/man/isForkedNode.Rd0000644000175000017500000000111414025754625016172 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isForkedNode.R \name{isForkedNode} \alias{isForkedNode} \title{Checks whether or not a Cluster Node Runs in a Forked Process} \usage{ isForkedNode(node, ...) } \arguments{ \item{node}{A cluster node of class \code{SOCKnode} or \code{SOCK0node}.} \item{\ldots}{Not used.} } \value{ (logical) Returns TRUE if the cluster node is running in a forked child process and FALSE if it does not. If it cannot be inferred, NA is returned. } \description{ Checks whether or not a Cluster Node Runs in a Forked Process } parallelly/man/autoStopCluster.Rd0000644000175000017500000000161014147107254016772 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/autoStopCluster.R \name{autoStopCluster} \alias{autoStopCluster} \title{Automatically Stop a Cluster when Garbage Collected} \usage{ autoStopCluster(cl, debug = FALSE) } \arguments{ \item{cl}{A cluster object created by for instance \code{\link[=makeClusterPSOCK]{makeClusterPSOCK()}} or \code{\link[parallel:makeCluster]{parallel::makeCluster()}}.} \item{debug}{If TRUE, then debug messages are produced when the cluster is garbage collected.} } \value{ The cluster object with attribute \code{gcMe} set. } \description{ Registers a finalizer to a cluster such that the cluster will be stopped when garbage collected } \examples{ cl <- makeClusterPSOCK(2, dryrun = TRUE) cl <- autoStopCluster(cl) print(cl) rm(list = "cl") gc() } \seealso{ The cluster is stopped using \code{\link[parallel:makeCluster]{stopCluster}(cl)}. } parallelly/man/pid_exists.Rd0000644000175000017500000000346014116406533015770 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils,pid.R \name{pid_exists} \alias{pid_exists} \title{Check whether a process PID exists or not} \usage{ pid_exists(pid, debug = getOption2("parallelly.debug", FALSE)) } \arguments{ \item{pid}{A positive integer.} } \value{ Returns \code{TRUE} if a process with the given PID exists, \code{FALSE} if a process with the given PID does not exists, and \code{NA} if it is not possible to check PIDs on the current system. } \description{ Check whether a process PID exists or not } \details{ There is no single go-to function in \R for testing whether a PID exists or not. Instead, this function tries to identify a working one among multiple possible alternatives. A method is considered working if the PID of the current process is successfully identified as being existing such that \code{pid_exists(Sys.getpid())} is \code{TRUE}. If no working approach is found, \code{pid_exists()} will always return \code{NA} regardless of PID tested. On Unix, including macOS, alternatives \code{tools::pskill(pid, signal = 0L)} and \code{system2("ps", args = pid)} are used. On Windows, various alternatives of \code{system2("tasklist", ...)} are used. } \references{ \enumerate{ \item The Open Group Base Specifications Issue 7, 2018 edition, IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) \url{https://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html} \item Microsoft, tasklist, 2018-08-30, \url{https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/tasklist} \item R-devel thread 'Detecting whether a process exists or not by its PID?', 2018-08-30. \url{https://stat.ethz.ch/pipermail/r-devel/2018-August/076702.html} } } \seealso{ \code{\link[tools]{pskill}()} and \code{\link[base]{system2}()}. } \keyword{internal} parallelly/man/cpuLoad.Rd0000644000175000017500000000170714025754625015215 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cpuLoad.R \name{cpuLoad} \alias{cpuLoad} \title{Get the Recent CPU Load} \usage{ cpuLoad() } \value{ A named numeric vector with three elements \verb{1min}, \verb{5min}, and \verb{15min} with non-negative values. These values represent estimates of the CPU load during the last minute, the last five minutes, and the last fifteen minutes [1]. An idle system have values close to zero, and a heavily loaded system have values near \code{parallel::detectCores()}. If they are unknown, missing values are returned. } \description{ Get the Recent CPU Load } \details{ This function works only Unix-like system with \file{/proc/loadavg}. } \examples{ loadavg <- cpuLoad() print(loadavg) } \references{ \enumerate{ \item Linux Load Averages: Solving the Mystery, Brendan Gregg's Blog, 2017-08-08, \url{http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html} } } \keyword{internal} parallelly/man/parallelly.options.Rd0000644000175000017500000002505414146362545017462 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/options.R \name{parallelly.options} \alias{parallelly.options} \alias{parallelly.debug} \alias{parallelly.availableCores.custom} \alias{parallelly.availableCores.methods} \alias{parallelly.availableCores.fallback} \alias{parallelly.availableCores.omit} \alias{parallelly.availableCores.system} \alias{parallelly.availableWorkers.methods} \alias{parallelly.availableWorkers.custom} \alias{parallelly.fork.enable} \alias{parallelly.supportsMulticore.unstable} \alias{R_PARALLELLY_AVAILABLECORES_FALLBACK} \alias{R_PARALLELLY_AVAILABLECORES_OMIT} \alias{R_PARALLELLY_AVAILABLECORES_SYSTEM} \alias{R_PARALLELLY_FORK_ENABLE} \alias{R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE} \alias{future.availableCores.custom} \alias{future.availableCores.methods} \alias{future.availableCores.fallback} \alias{future.availableCores.system} \alias{future.availableWorkers.methods} \alias{future.availableWorkers.custom} \alias{future.fork.enable} \alias{future.supportsMulticore.unstable} \alias{R_FUTURE_AVAILABLECORES_FALLBACK} \alias{R_FUTURE_AVAILABLECORES_SYSTEM} \alias{R_FUTURE_FORK_ENABLE} \alias{R_FUTURE_SUPPORTSMULTICORE_UNSTABLE} \alias{parallelly.makeNodePSOCK.setup_strategy} \alias{parallelly.makeNodePSOCK.validate} \alias{parallelly.makeNodePSOCK.connectTimeout} \alias{parallelly.makeNodePSOCK.timeout} \alias{parallelly.makeNodePSOCK.useXDR} \alias{parallelly.makeNodePSOCK.socketOptions} \alias{parallelly.makeNodePSOCK.rshcmd} \alias{parallelly.makeNodePSOCK.rshopts} \alias{parallelly.makeNodePSOCK.tries} \alias{parallelly.makeNodePSOCK.tries.delay} \alias{R_PARALLELLY_MAKENODEPSOCK.SETUP_STRATEGY} \alias{R_PARALLELLY_MAKENODEPSOCK.VALIDATE} \alias{R_PARALLELLY_MAKENODEPSOCK.CONNECTTIMEOUT} \alias{R_PARALLELLY_MAKENODEPSOCK.TIMEOUT} \alias{R_PARALLELLY_MAKENODEPSOCK.USEXDR} \alias{R_PARALLELLY_MAKENODEPSOCK.SOCKETOPTIONS} \alias{R_PARALLELLY_MAKENODEPSOCK.RSHCMD} \alias{R_PARALLELLY_MAKENODEPSOCK.RSHOPTS} \alias{R_PARALLELLY_MAKENODEPSOCK.TRIES} \alias{R_PARALLELLY_MAKENODEPSOCK.TRIES.DELAY} \title{Options Used by the 'parallelly' Package} \description{ Below are the \R options and environment variables that are used by the \pkg{parallelly} package and packages enhancing it.\cr \cr \emph{WARNING: Note that the names and the default values of these options may change in future versions of the package. Please use with care until further notice.} } \section{Backward compatibility with the \pkg{future} package}{ The functions in the \pkg{parallelly} package originates from the \pkg{future} package. Because they are widely used within the future ecosystem, we need to keep them backward compatible for quite a long time, in order for all existing packages and R scripts to have time to adjust. This also goes for the \R options and the environment variables used to configure these functions. All options and environment variables used here have prefixes \code{parallelly.} and \code{R_PARALLELLY_}, respectively. Because of the backward compatibility with the \pkg{future} package, the same settings can also be controlled by options and environment variables with prefixes \code{future.} and \code{R_FUTURE_} until further notice, e.g. setting option \option{future.availableCores.fallback=1} is the same as setting option \option{parallelly.availableCores.fallback=1}, and setting environment variable \env{R_FUTURE_AVAILABLECORES_FALLBACK=1} is the same as setting \env{R_PARALLELLY_AVAILABLECORES_FALLBACK=1}. } \section{Configuring number of parallel workers}{ The below \R options and environment variables control the default results of \code{\link[=availableCores]{availableCores()}} and \code{\link[=availableWorkers]{availableWorkers()}}. \describe{ \item{\option{parallelly.availableCores.logical}:}{(logical) The default value of argument \code{logical} as used by \code{availableCores()}, \code{availableWorkers()}, and \code{availableCores()} for querying \code{parallel::detectCores(logical = logical)}. The default is \code{TRUE} just like it is for \code{\link[parallel:detectCores]{parallel::detectCores()}}.} \item{\option{parallelly.availableCores.methods}:}{(character vector) Default lookup methods for \code{\link[=availableCores]{availableCores()}}. (Default: \code{c("system", "nproc", "mc.cores", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "fallback", "custom")})} \item{\option{parallelly.availableCores.custom}:}{(function) If set and a function, then this function will be called (without arguments) by \code{\link[=availableCores]{availableCores()}} where its value, coerced to an integer, is interpreted as a number of cores.} \item{\option{parallelly.availableCores.fallback}:}{(integer) Number of cores to use when no core-specifying settings are detected other than \code{"system"} and \code{"nproc"}. This options makes it possible to set the default number of cores returned by \code{availableCores()} / \code{availableWorkers()} yet allow users and schedulers to override it. In multi-tenant environment, such as HPC clusters, it is useful to set environment variable \env{R_PARALLELLY_AVAILABLECORES_FALLBACK} to \code{1}, which will set this option when the package is loaded.} \item{\option{parallelly.availableCores.system}:}{(integer) Number of "system" cores used instead of what is reported by \code{\link{availableCores}(which = "system")}. This option allows you to effectively override what \code{parallel::detectCores()} reports the system has.} \item{\option{parallelly.availableCores.omit}:}{(integer) Number of cores to set aside, i.e. not to include.} \item{\option{parallelly.availableWorkers.methods}:}{(character vector) Default lookup methods for \code{\link[=availableWorkers]{availableWorkers()}}. (Default: \code{c("mc.cores", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "custom", "system", "fallback")})} \item{\option{parallelly.availableWorkers.custom}:}{(function) If set and a function, then this function will be called (without arguments) by \code{\link[=availableWorkers]{availableWorkers()}} where its value, coerced to a character vector, is interpreted as hostnames of available workers.} } } \section{Configuring forked parallel processing}{ The below \R options and environment variables control the default result of \code{\link[=supportsMulticore]{supportsMulticore()}}. \describe{ \item{\option{parallelly.fork.enable}:}{(logical) Enable or disable \emph{forked} processing. If \code{FALSE}, multicore futures becomes sequential futures. If \code{NA}, or not set (the default), the a set of best-practices rules decide whether should be supported or not.} \item{\option{parallelly.supportsMulticore.unstable}:}{(character) Controls whether a warning should be produced or not whenever multicore processing is automatically disabled because the environment in which R runs is considered unstable for forked processing, e.g. in the RStudio environment. If \code{"warn"} (default), then an informative warning is produces the first time 'multicore' or 'multiprocess' futures are used. If \code{"quiet"}, no warning is produced.} } } \section{Configuring setup of parallel PSOCK clusters}{ The below \R options and environment variables control the default results of \code{\link[=makeClusterPSOCK]{makeClusterPSOCK()}} and its helper function \code{\link[=makeNodePSOCK]{makeNodePSOCK()}} that creates the individual cluster nodes. \describe{ \item{\option{parallelly.makeNodePSOCK.setup_strategy}:}{(character) If \code{"parallel"} (default), the PSOCK cluster nodes are set up concurrently. If \code{"sequential"}, they are set up sequentially.} \item{\option{parallelly.makeNodePSOCK.validate}:}{(logical) If TRUE (default), after the nodes have been created, they are all validated that they work by inquiring about their session information, which is saved in attribute \code{session_info} of each node.} \item{\option{parallelly.makeNodePSOCK.connectTimeout}:}{(numeric) The maximum time (in seconds) allowed for each socket connection between the master and a worker to be established (defaults to 2*60 seconds = 2 minutes).} \item{\option{parallelly.makeNodePSOCK.timeout}:}{(numeric) The maximum time (in seconds) allowed to pass without the master and a worker communicate with each other (defaults to 30\emph{24}60*60 seconds = 30 days).} \item{\option{parallelly.makeNodePSOCK.useXDR}:}{(logical) If FALSE (default), the communication between master and workers, which is binary, will use small-endian (faster), otherwise big-endian ("XDR"; slower).} \item{\option{parallelly.makeNodePSOCK.socketOptions}:}{(character string) If set to another value than \code{"NULL"}, then option \option{socketOptions} is set to this value on the workers during startup. See \code{\link[base:connections]{base::socketConnection()}} for details. (defaults to \code{"no-delay"})} \item{\option{parallelly.makeNodePSOCK.rshcmd}:}{(character vector) The command to be run on the master to launch a process on another host.} \item{\option{parallelly.makeNodePSOCK.rshopts}:}{(character vector) Addition command-line options appended to \code{rshcmd}. These arguments are only applied when connecting to non-localhost machines.} \item{\option{parallelly.makeNodePSOCK.tries}:}{(integer) The maximum number of attempts done to launch each node. Only used when setting up cluster nodes using the sequential strategy.} \item{\option{parallelly.makeNodePSOCK.tries.delay}:}{(numeric) The number of seconds to wait before trying to launch a cluster node that failed to launch previously. Only used when setting up cluster nodes using the sequential strategy.} } } \section{Options for debugging}{ \describe{ \item{\option{parallelly.debug}:}{(logical) If \code{TRUE}, extensive debug messages are generated. (Default: \code{FALSE})} } } \section{Environment variables that set R options}{ All of the above \R \option{parallelly.*} options can be set by corresponding environment variables \env{R_PARALLELLY_*} \emph{when the \pkg{parallelly} package is loaded}. For example, if \code{R_PARALLELLY_MAKENODEPSOCK_SETUP_STRATEGY = "sequential"}, then option \option{parallelly.makeNodePSOCK.setup_strategy} is set to \code{"sequential"} (character). Similarly, if \code{R_PARALLELLY_AVAILABLECORES_FALLBACK = "1"}, then option \option{parallelly.availableCores.fallback} is set to \code{1} (integer). } \examples{ # Set an R option: options(parallelly.availableCores.fallback = 1L) } \seealso{ To set \R options when \R starts (even before the \pkg{parallelly} package is loaded), see the \link[base]{Startup} help page. The \href{https://cran.r-project.org/package=startup}{\pkg{startup}} package provides a friendly mechanism for configuring \R's startup process. } parallelly/man/isConnectionValid.Rd0000644000175000017500000001244714025754625017244 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isConnectionValid.R \name{isConnectionValid} \alias{isConnectionValid} \alias{connectionId} \title{Checks if a Connection is Valid} \usage{ isConnectionValid(con) connectionId(con) } \arguments{ \item{con}{A \link[base:connections]{connection}.} } \value{ \code{isConnectionValid()} returns TRUE if the connection is still valid, otherwise FALSE. If FALSE, then character attribute \code{reason} provides an explanation why the connection is not valid. \code{connectionId()} returns an non-negative integer, -1, or \code{NA_integer_}. For connections stdin, stdout, and stderr, 0, 1, and 2, are returned, respectively. For all other connections, an integer greater or equal to 3 based on the connection's internal pointer is returned. A connection that has been serialized, which is no longer valid, has identifier -1. Attribute \code{raw_id} returns the pointer string from which the above is inferred. } \description{ Get a unique identifier for an R \link[base:connections]{connection} and check whether or not the connection is still valid. } \section{Connection Index versus Connection Identifier}{ R represents \link[base:connections]{connections} as indices using plain integers, e.g. \code{idx <- as.integer(con)}. The three connections standard input ("stdin"), standard output ("stdout"), and standard error ("stderr") always exists and have indices 0, 1, and 2. Any connection opened beyond these will get index three or greater, depending on availability as given by \code{\link[base:showConnections]{base::showConnections()}}. To get the connection with a given index, use \code{\link[base:showConnections]{base::getConnection()}}. \strong{Unfortunately, this index representation of connections is non-robust}, e.g. there are cases where two or more 'connection' objects can end up with the same index and if used, the written output may end up at the wrong destination and files and database might get corrupted. This can for instance happen if \code{\link[base:showConnections]{base::closeAllConnections()}} is used (*). \strong{In contrast, \code{id <- connectionId(con)} gives an identifier that is unique to that 'connection' object.} This identifier is based on the internal pointer address of the object. The risk for two connections in the same \R session to end up with the same pointer address is very small. Thus, in case we ended up in a situation where two connections \code{con1} and \code{con2} share the same index - \code{as.integer(con1) == as.integer(con2)} - they will never share the same identifier - \code{connectionId(con1) != connectionId(con2)}. Here, \code{isConnectionValid()} can be used to check which one of these connections, if any, are valid. (*) Note that there is no good reason for calling \code{closeAllConnections()} If called, there is a great risk that the files get corrupted etc. See (1) for examples and details on this problem. If you think there is a need to use it, it is much safer to restart \R because that is guaranteed to give you a working \R session with non-clashing connections. It might also be that \code{closeAllConnections()} is used because \code{\link[base:base-internal]{base::sys.save.image()}} is called, which might happen if \R is being forced to terminate. } \section{Connections Cannot be Serialized Or Saved}{ A 'connection' cannot be serialized, e.g. it cannot be saved to file to be read and used in another \R session. If attempted, the connection will not be valid. This is a problem that may occur in parallel processing when passing an \R object to parallel worker for further processing, e.g. the exported object may hold an internal database connection which will no longer be valid on the worker. When a connection is serialized, its internal pointer address will be invalidated (set to nil). In such cases, \code{connectionId(con)} returns -1 and \code{isConnectionValid(con)} returns FALSE. } \examples{ ## R represents connections as plain indices as.integer(stdin()) ## int 0 as.integer(stdout()) ## int 1 as.integer(stderr()) ## int 2 ## The first three connections always exist and are always valid isConnectionValid(stdin()) ## TRUE connectionId(stdin()) ## 0L isConnectionValid(stdout()) ## TRUE connectionId(stdout()) ## 1L isConnectionValid(stderr()) ## TRUE connectionId(stderr()) ## 2L ## Connections cannot be serialized con <- file(tempfile(), open = "w") x <- list(value = 42, stderr = stderr(), con = con) y <- unserialize(serialize(x, connection = NULL)) isConnectionValid(y$stderr) ## TRUE connectionId(y$stderr) ## 2L isConnectionValid(y$con) ## FALSE with attribute 'reason' connectionId(y$con) ## -1L close(con) } \references{ \enumerate{ \item \href{https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81}{'BUG: A \code{connection} object may become corrupt and re-referenced to another connection (PATCH)'}, 2018-10-30. \item R-devel thread \href{https://stat.ethz.ch/pipermail/r-devel/2018-October/077004.html}{PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()}, 2018-10-31. } } \seealso{ See \code{\link[base:showConnections]{base::showConnections()}} for currently open connections and their indices. To get a connection by its index, use \code{\link[base:showConnections]{base::getConnection()}}. } parallelly/man/availableConnections.Rd0000644000175000017500000000430114116406542017733 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/availableConnections.R \name{availableConnections} \alias{availableConnections} \alias{freeConnections} \title{Number of Available and Free Connections} \usage{ availableConnections() freeConnections() } \value{ A non-negative integer, or \code{+Inf} if the available number of connections is greated than 16384, which is a limit be set via option \option{parallelly.availableConnections.tries}. } \description{ The number of \link{connections} that can be open at the same time in \R is \emph{typically} 128, where the first three are occupied by the always open \code{\link[=stdin]{stdin()}}, \code{\link[=stdout]{stdout()}}, and \code{\link[=stderr]{stderr()}} connections, which leaves 125 slots available for other types of connections. Connections are used in many places, e.g. reading and writing to file, downloading URLs, communicating with parallel \R processes over a socket connections, and capturing standard output via text connections. } \section{How to increase the limit}{ This limit of 128 connections can only be changed by rebuilding \R from source. The limited is hardcoded as a\if{html}{\out{
}}\preformatted{#define NCONNECTIONS 128 }\if{html}{\out{
}} in \file{src/main/connections.c}. } \section{How the limit is identified}{ Since the limit \emph{might} changed, for instance in custom \R builds or in future releases of \R, we do not want to assume that the limit is 128 for all \R installation. Unfortunately, it is not possible to query \R for what the limit is. Instead, \code{availableConnections()} infers it from trial-and-error. until it fails. For efficiency, the result is memoized throughout the current \R session. } \examples{ total <- availableConnections() message("You can have ", total, " connections open in this R installation") free <- freeConnections() message("There are ", free, " connections remaining") } \references{ \enumerate{ \item 'WISH: Increase limit of maximum number of open connections (currently 125+3)', 2016-07-09, \url{https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28} } } \seealso{ \code{\link[base:showConnections]{base::showConnections()}}. } parallelly/man/isLocalhostNode.Rd0000644000175000017500000000112614025754625016713 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isLocalhostNode.R \name{isLocalhostNode} \alias{isLocalhostNode} \title{Checks whether or not a Cluster Node Runs on Localhost} \usage{ isLocalhostNode(node, ...) } \arguments{ \item{node}{A cluster node of class \code{SOCKnode} or \code{SOCK0node}.} \item{\ldots}{Not used.} } \value{ (logical) Returns TRUE if the cluster node is running on the current machine and FALSE if it runs on another machine. If it cannot be inferred, NA is returned. } \description{ Checks whether or not a Cluster Node Runs on Localhost } parallelly/man/getOption2.Rd0000644000175000017500000000170414060224156015643 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/getOptionOrEnvVar.R \name{getOption2} \alias{getOption2} \title{Gets an R Option or an Environment Variable} \usage{ getOption2(name, default = NULL) } \arguments{ \item{name}{(character string) The name of the \R option.} \item{default}{(a single object) The value to be returned if neither the \R option nor the environment variable is set. If the environment variable is set, its value is coerced to the same type as \code{default}.} \item{envvar}{(character string) The name of the environment variable. If not set, or NULL, then the name is automatically constructed from the upper-case version of argument \code{name} with periods (\code{.}) substituted by underscores (\verb{_}) and prefixed with \code{R_}, e.g. with \code{"abc.debug"} becomes \code{R_ABC_DEBUG}.} } \value{ Returns an object. } \description{ Gets an R Option or an Environment Variable } \keyword{internal} parallelly/man/availableWorkers.Rd0000644000175000017500000001162014075254325017113 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/availableWorkers.R \name{availableWorkers} \alias{availableWorkers} \title{Get Set of Available Workers} \usage{ availableWorkers( methods = getOption2("parallelly.availableWorkers.methods", c("mc.cores", "BiocParallel", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "custom", "system", "fallback")), na.rm = TRUE, logical = getOption2("parallelly.availableCores.logical", TRUE), default = getOption2("parallelly.localhost.hostname", "localhost"), which = c("auto", "min", "max", "all") ) } \arguments{ \item{methods}{A character vector specifying how to infer the number of available cores.} \item{na.rm}{If TRUE, only non-missing settings are considered/returned.} \item{logical}{Passed as-is to \code{\link[=availableCores]{availableCores()}}.} \item{default}{The default set of workers.} \item{which}{A character specifying which set / sets to return. If \code{"auto"} (default), the first non-empty set found. If \code{"min"}, the minimum value is returned. If \code{"max"}, the maximum value is returned (be careful!) If \code{"all"}, all values are returned.} } \value{ Return a character vector of workers, which typically consists of names of machines / compute nodes, but may also be IP numbers. } \description{ Get Set of Available Workers } \details{ The default set of workers for each method is \code{rep("localhost", times = availableCores(methods = method, logical = logical))}, which means that each will at least use as many parallel workers on the current machine that \code{\link[=availableCores]{availableCores()}} allows for that method. In addition, the following settings ("methods") are also acknowledged: \itemize{ \item \code{"PBS"} - Query TORQUE/PBS environment variable \env{PBS_NODEFILE}. If this is set and specifies an existing file, then the set of workers is read from that file, where one worker (node) is given per line. An example of a job submission that results in this is \verb{qsub -l nodes = 4:ppn = 2}, which requests four nodes each with two cores. \item \code{"SGE"} - Query Sun/Oracle Grid Engine (SGE) environment variable \env{PE_HOSTFILE}. An example of a job submission that results in this is \verb{qsub -pe mpi 8} (or \verb{qsub -pe ompi 8}), which requests eight cores on a any number of machines. \item \code{"LSF"} - Query LSF/OpenLava environment variable \env{LSB_HOSTS}. \item \code{"Slurm"} - Query Slurm environment variable \env{SLURM_JOB_NODELIST} (fallback to legacy \env{SLURM_NODELIST}) and parse set of nodes. Then query Slurm environment variable \env{SLURM_JOB_CPUS_PER_NODE} (fallback \env{SLURM_TASKS_PER_NODE}) to infer how many CPU cores Slurm have alloted to each of the nodes. If \env{SLURM_CPUS_PER_TASK} is set, which is always a scalar, then that is respected too, i.e. if it is smaller, then that is used for all nodes. For example, if \code{SLURM_NODELIST="n1,n[03-05]"} (expands to \code{c("n1", "n03", "n04", "n05")}) and \code{SLURM_JOB_CPUS_PER_NODE="2(x2),3,2"} (expands to \code{c(2, 2, 3, 2, 2)}), then \code{c("n1", "n1", "n03", "n03", "n04", "n04", "n04", "n05", "n05")} is returned. If in addition, \code{SLURM_CPUS_PER_TASK=1}, which can happen depending on hyperthreading configurations on the Slurm cluster, then \code{c("n1", "n03", "n04", "n05")} is returned. \item \code{"custom"} - If option \option{parallelly.availableWorkers.custom} is set and a function, then this function will be called (without arguments) and it's value will be coerced to a character vector, which will be interpreted as hostnames of available workers. } } \section{Known limitations}{ \code{availableWorkers(methods = "Slurm")} will expand \env{SLURM_JOB_NODELIST} using \command{scontrol show hostnames "$SLURM_JOB_NODELIST"}, if available. If not available, then it attempts to parse the compressed nodelist based on a best-guess understanding on what the possible syntax may be. One known limitation is that "multi-dimensional" ranges are not supported, e.g. \code{"a[1-2]b[3-4]"} is expanded by \command{scontrol} to \code{c("a1b3", "a1b4", "a2b3", "a2b4")}. If \command{scontrol} is not available, then any components that failed to be parsed are dropped with an informative warning message. If no compents could be parsed, then the result of \code{methods = "Slurm"} will be empty. } \examples{ message(paste("Available workers:", paste(sQuote(availableWorkers()), collapse = ", "))) \dontrun{ options(mc.cores = 2L) message(paste("Available workers:", paste(sQuote(availableWorkers()), collapse = ", "))) } \dontrun{ ## Always use two workers on host 'n1' and one on host 'n2' options(parallelly.availableWorkers.custom = function() { c("n1", "n1", "n2") }) message(paste("Available workers:", paste(sQuote(availableWorkers()), collapse = ", "))) } } \seealso{ To get the number of available workers on the current machine, see \code{\link[=availableCores]{availableCores()}}. } parallelly/man/figures/0000755000175000017500000000000014025754625014776 5ustar nileshnileshparallelly/man/figures/lifecycle-maturing-blue.svg0000644000175000017500000000170614025754625022233 0ustar nileshnileshlifecyclelifecyclematuringmaturing parallelly/man/figures/logo.R0000644000175000017500000000124014025754625016056 0ustar nileshnileshimgfile <- file.path(tempdir(), "logo.png") logo <- hexSticker::sticker( ~ { par(mar = c(0,0,0,0)) plot(NA, xlim = c(0,1), ylim = c(-0.2,1.4), xlab = "", ylab = "", bty = "n", axes = FALSE) lines(x = c(1,1)*0.35, y = c(0,1), lwd = 10, lend = 0L, col = "white") lines(x = c(1,1)*0.65, y = c(0,1), lwd = 10, lend = 0L, col = "white") }, package="parallelly", p_size=20, s_x=0.8, s_y=0.6, s_width=1.4, s_height=1.2, h_color="#f39c12", filename=imgfile ) img <- magick::image_read(imgfile) img2 <- magick::image_resize(img, geometry = c(120, 140)) magick::image_write(img2, "man/figures/logo.png") magick::image_write(img, "logo-large.png") parallelly/man/figures/logo.png0000644000175000017500000001763214025754625016455 0ustar nileshnileshPNG  IHDRxb]egAMA a cHRMz&u0`:pQ<bKGD pHYs.#.#x?vtIME 1|} RIDATxwTUߵvN49("c@RcuϤ9y;s9w9sQ'_mQ(( HMh:W{ǮQe@o[ iBh1xz;8{k:KQR! < `| x JSx]GrV9w g xEVVy}mNN z<`Dy8 q!Og fskB4E%sBPEC |K;ݏYZUhz5{8F `Ї)I}]??wi*RzD*Uk)<ޖhtw.fxQ,CHl~ժPp_m]V`+bVDM11nC!qrV ,b{3.B! ֠uU5?n?X.gU|aDR$L;!^ `s84|-O B*AF5Ϳ[|oۨFY}Iq!5s8J)bCFJU,]3N˜P˄=@йف%\}BD#БyܩY0)prŵKX.ąq|am-&Hr{s'i2,FM3N)p B F"E  byrB<4VG,(]Ý78N%3N%p"F#{@Siv#fV֒'3qo-J:BkNOsBjSWїN!p I @x : d{j$($&v2'ox>/4xX%lw<&bg(O'X.g}b4IlX%(dxx?T};LǃBr>ؑSD d5϶{XU&o@; W{` {;Eh@X6zDs,5!iΊ}CӿY5RkFƖPqkv ngB1g(#},&6E 4ܟ27+ݛl[2iNgo][.ʳ3$ʗG, &q)1%H DXHnۖl^22BؒYq~ )•4ؓ˳iL*B:FQ-wͿq;?IJ% T%ڠ`^ʞYhX6-U1E4tϞv~7ѷ2Yoے;?!% |uR+F4x27cx2cubgŶ%/u}Ɯbx>gْ)`VsͰ%PSgJ'#YڕYD1|ȜbFvڶd@lB @ *fxwt)`ilu /l;`=Znv p`C{EyIpEe}P |_ϓhA^^ 4{tyb?1qK]KP6b"5y{eALAR&^`srwDx[0aOc=f $?p$ʲ%Ww˖E Ձ%Q4:>hxPD޻ف\kےX-M* t/MiUqxR!OҠ“L)?>~{'շ[+c˞ *Vzt2F7A )wNC2D"ࠗAA[ -9w vUGpH} )}=DzS bhHI@Hm_5tߜ*VhAYQuX-Win;Ov!a柯5#{Yp -n}PcP Zs®?|{g32gYJf*p!4S8 ъh*eHb $.MpG-Qhiwsk15J[V)ڏ֧_Sيz*Dk}lggVGmEn& (bz.&,m>9a"I!B/W )wCsY"zY (rOu)dtNRJD [Ւ>& (Whq"ڲ-`ygWC&` dE`SkFtCYz08bᤉҚkѤ48 A=1~d'*Cxuk!7,Ci6c)׋OwNV㟗x82qKkQqO1ї]ş4߀7El.((P73HvdoveVq)CKϕYb_\7/{xeKR ^v{멎5;vw^ڋ-GØʺ%^'э'>:WCi횃< 4>'WKzbg-O}|ǫfX_^7Hu Ӈߦ d0{jbڞV2iP1/n>?o~'vKYVl lHxw_=Oo<ێEXQ0yp ʡCI):U@mPdHCf.ŋ]G(i"51j)z\{8ECy[)CJWU쪎bj#!aH%ѯ*˜>,p\5X ϩUd@]09Pg׫ЍːS+IˏSZ\ׁC^} ><ԈG7ݙ77=H!Zː-R ԏy{7u=Q.f:~8z_} Ǒ0jc5, Qgaj4K93f~5m s4$?&;| ʌpC~2DKֺ)LOXĒ\ܻnd3d]`,g*V`_mxZ1g kAR'W?fNESqe{x~14?<ºےҚ# ;kyuKΌ/ !Nkqo5POYU(;4Hd*&Һ(#z>DS&wb*MTZSuPv4$nhV+@Ҵe~'^˰ $ ˠgav"34[됀}D&w}V1yt9k)Sa*ME=mmmУӰƾu41(eb.WȖad7nUFU(I8i6$yxA^REJYNL$D,MIA4ev_=>=gUQ:] 4SM!;JFR&wpXeUSِDp"Tp>F]#4 4l6bce-G#8 sqH*Wώ*ј0z4LsOv;eZ۴;Lg/lSR>UܴҌ`xw?+wJŏ^΂p!FS.IVj)6ʉ_c}1^ ):5{]޳MV6`0wRL+Ulty.xH=515H K,pSK6F2۽~ܖtio^XC1 TM!OoX-wEsqO^[;jԚ>e6ʁl!%yv*1&\w/ns5]eAĵ;Et6#˚=6k[.+C ~R9W)8zz~[ez_}>]}E8O q8ys9N^'/p8 q8ys9N^'/p8 q8ys9N^'/p8 q8ysK`m-ޕ߷y^/ur]u'VH)VC~g*Diɳ~_Ks˿w$R20Pp@wt? OcQl2:򬠧Q;w9c&'|4P0Z6s:*04\&Q4>L`aM\BL0:i5L$:/".$h <5[@|aC UL8ey2Lإqoq%=d-|'B+յj? ;j8&8Y^lKZR΄m 01ؘ'³XKT{r.?6 ,as]^^@`R3Wno_aee| ó؜֩m梐8I3޽יތ[$mo%[` F=v3+J2F݀;3$U*_Sx>rҽdӞs w2û6Vw+Q]2 /E PV} dg l_ )-y@h=y!2Wp I~m~1_;wi_U/ lDXeϛ$@!؜<':?Heu_hD = 1) integer. If \code{which = "all"}, then more than one value may be returned. Together with \code{na.rm = FALSE} missing values may also be returned. } \description{ The current/main \R session counts as one, meaning the minimum number of cores available is always at least one. } \details{ The following settings ("methods") for inferring the number of cores are supported: \itemize{ \item \code{"system"} - Query \code{\link[parallel]{detectCores}(logical = logical)}. \item \code{"nproc"} - On Unix, query system command \code{nproc}. \item \code{"mc.cores"} - If available, returns the value of option \code{\link[base:options]{mc.cores}}. Note that \option{mc.cores} is defined as the number of \emph{additional} \R processes that can be used in addition to the main \R process. This means that with \code{mc.cores = 0} all calculations should be done in the main \R process, i.e. we have exactly one core available for our calculations. The \option{mc.cores} option defaults to environment variable \env{MC_CORES} (and is set accordingly when the \pkg{parallel} package is loaded). The \option{mc.cores} option is used by for instance \code{\link[=mclapply]{mclapply}()} of the \pkg{parallel} package. \item \code{"BiocParallel"} - Query environment variables \env{BIOCPARALLEL_WORKER_NUMBER} (integer), which is defined by \strong{BiocParallel} (>= 1.27.2), and \env{BBS_HOME} (logical). If the former is set, this is the number of cores considered. If the latter is set, then a maximum of 4 cores is considered. \item \code{"PBS"} - Query TORQUE/PBS environment variables \env{PBS_NUM_PPN} and \env{NCPUS}. Depending on PBS system configuration, these \emph{resource} parameters may or may not default to one. An example of a job submission that results in this is \verb{qsub -l nodes=1:ppn=2}, which requests one node with two cores. \item \code{"SGE"} - Query Sun/Oracle Grid Engine (SGE) environment variable \env{NSLOTS}. An example of a job submission that results in this is \verb{qsub -pe smp 2} (or \verb{qsub -pe by_node 2}), which requests two cores on a single machine. \item \code{"Slurm"} - Query Simple Linux Utility for Resource Management (Slurm) environment variable \env{SLURM_CPUS_PER_TASK}. This may or may not be set. It can be set when submitting a job, e.g. \verb{sbatch --cpus-per-task=2 hello.sh} or by adding \verb{#SBATCH --cpus-per-task=2} to the \file{hello.sh} script. If \env{SLURM_CPUS_PER_TASK} is not set, then it will fall back to use \env{SLURM_CPUS_ON_NODE} if the job is a single-node job (\env{SLURM_JOB_NUM_NODES} is 1), e.g. \verb{sbatch --ntasks=2 hello.sh}. \item \code{"LSF"} - Query Platform Load Sharing Facility (LSF) environment variable \env{LSB_DJOB_NUMPROC}. Jobs with multiple (CPU) slots can be submitted on LSF using \verb{bsub -n 2 -R "span[hosts=1]" < hello.sh}. \item \code{"custom"} - If option \option{parallelly.availableCores.custom} is set and a function, then this function will be called (without arguments) and it's value will be coerced to an integer, which will be interpreted as a number of available cores. If the value is NA, then it will be ignored. } For any other value of a \code{methods} element, the \R option with the same name is queried. If that is not set, the system environment variable is queried. If neither is set, a missing value is returned. } \section{Avoid ending up with zero cores}{ Note that some machines might have a limited number of cores, or the R process runs in a container or a cgroup that only provides a small number of cores. In such cases:\if{html}{\out{
}}\preformatted{ncores <- availableCores() - 1 }\if{html}{\out{
}} may return zero, which is often not intended and is likely to give an error downstream. Instead, use:\if{html}{\out{
}}\preformatted{ncores <- availableCores(omit = 1) }\if{html}{\out{
}} to put aside one of the cores from being used. Regardless how many cores you put aside, this function is guaranteed to return at least one core. } \section{Advanced usage}{ It is possible to override the maximum number of cores on the machine as reported by \code{availableCores(methods = "system")}. This can be done by first specifying \code{options(parallelly.availableCores.methods = "mc.cores")} and then the number of cores to use, e.g. \code{options(mc.cores = 8)}. } \examples{ message(paste("Number of cores available:", availableCores())) \dontrun{ options(mc.cores = 2L) message(paste("Number of cores available:", availableCores())) } \dontrun{ ## IMPORTANT: availableCores() may return 1L options(mc.cores = 1L) ncores <- availableCores() - 1 ## ncores = 0 ncores <- availableCores(omit = 1) ## ncores = 1 message(paste("Number of cores to use:", ncores)) } \dontrun{ ## Use 75\% of the cores on the system but never more than four options(parallelly.availableCores.custom = function() { ncores <- max(parallel::detectCores(), 1L, na.rm = TRUE) ncores <- min(as.integer(0.75 * ncores), 4L) max(1L, ncores) }) message(paste("Number of cores available:", availableCores())) ## What is available minus one core but at least one options(parallelly.availableCores.custom = function() { max(1L, parallelly::availableCores() - 1L) }) message(paste("Number of cores available:", availableCores())) } } \seealso{ To get the set of available workers regardless of machine, see \code{\link[=availableWorkers]{availableWorkers()}}. } parallelly/man/canPortBeUsed.Rd0000644000175000017500000000133514060224156016307 0ustar nileshnilesh% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ports.R \name{canPortBeUsed} \alias{canPortBeUsed} \title{Check whether a TCP port can be opened or not} \usage{ canPortBeUsed(port) } \arguments{ \item{port}{(integer) A TCP port in [0, 65535].} } \value{ \code{canPortBeUsed(port)} returns a logical indicating whether the port can be opened or not, or cannot be queried. If the port can be opened, then \code{TRUE} is returned, if cannot be opened then \code{FALSE} is returned, which may happen if the port is used by another process. If port querying is not supported, as in R (< 4.0.0), then \code{NA} is returned. } \description{ Check whether a TCP port can be opened or not } \keyword{internal} parallelly/tests/0000755000175000017500000000000014156551340013713 5ustar nileshnileshparallelly/tests/options-and-envvars.R0000644000175000017500000000564214060224156017756 0ustar nileshnileshsource("incl/start.R") getOption2 <- parallelly:::getOption2 getEnvVar2 <- parallelly:::getEnvVar2 options(parallelly.some.option = NULL) options(parallelly.some.option = NULL) Sys.unsetenv("R_FUTURE_SOME_ENVVAR") Sys.unsetenv("R_PARALLELLY_SOME_ENVVAR") message("*** Options and environment variables ...") showall <- function() { utils::str(list( future.some.setting = getOption("future.some.setting", NULL), parallelly.some.setting = getOption("parallelly.some.setting", NULL), R_FUTURE_SOME_SETTING = Sys.getenv("R_FUTURE_SOME_SETTING", ""), R_PARALLELLY_SOME_SETTING = Sys.getenv("R_PARALLELLY_SOME_SETTING", "") )) } for (what in c("option", "envvar")) { if (what == "option") { setvalue <- function(name, value) { name <- sprintf("%s.some.setting", tolower(name)) if (is.null(value)) { args <- list(NULL) } else { args <- as.list(value) } names(args) <- name do.call(options, args = args) class(args) <- "option" args } } else if (what == "envvar") { setvalue <- function(name, value) { name <- sprintf("R_%s_SOME_SETTING", toupper(name)) if (is.null(value)) { Sys.unsetenv(name) args <- list(NULL) names(args) <- name } else { args <- as.list(value) names(args) <- name do.call(Sys.setenv, args = args) } class(args) <- "envvar" args } } for (name in c("future", "parallelly")) { for (value0 in list(NULL, TRUE)) { args <- setvalue(name, value0) stopifnot(inherits(args, what)) showall() if (is.null(value0)) { message("- getOption2()") value <- getOption2("future.some.setting", NA) stopifnot(is.na(value)) value <- getOption2("parallelly.some.setting", NA) stopifnot(is.na(value)) message("- getEnvVar2()") value <- getEnvVar2("R_FUTURE_SOME_ENVVAR", NA) stopifnot(is.na(value)) value <- getEnvVar2("R_PARALLELLY_SOME_ENVVAR", NA) stopifnot(is.na(value)) } else if (isTRUE(value0)) { if (what == "option") { message("- getOption2()") value1 <- getOption2("future.some.setting", NA) stopifnot(isTRUE(value1)) value2 <- getOption2("parallelly.some.setting", NA) stopifnot(isTRUE(value2)) } else if (what == "envvar") { message("- getEnvVar2()") value1 <- getEnvVar2("R_FUTURE_SOME_SETTING", NA) stopifnot(value1 == "TRUE") value2 <- getEnvVar2("R_PARALLELLY_SOME_SETTING", NA) stopifnot(value2 == "TRUE") } stopifnot(identical(value1, value2)) } args <- setvalue(name, NULL) stopifnot(inherits(args, what), is.null(args[[1]])) } ## for (value ...) } ## for (name ...) } ## for (what ...) message("*** Options and environment variables ... DONE") source("incl/end.R") parallelly/tests/isConnectionValid.R0000644000175000017500000000261714025754625017465 0ustar nileshnileshsource("incl/start.R") stopCluster <- parallel::stopCluster message("*** Connections ...") con <- stdin() idx <- as.integer(con) id <- connectionId(con) valid <- isConnectionValid(con) stopifnot(inherits(con, "connection"), idx == 0L, id == 0L, isTRUE(valid)) con <- stdout() idx <- as.integer(con) id <- connectionId(con) valid <- isConnectionValid(con) stopifnot(inherits(con, "connection"), idx == 1L, id == 1L, isTRUE(valid)) con <- stderr() idx <- as.integer(con) id <- connectionId(con) valid <- isConnectionValid(con) stopifnot(inherits(con, "connection"), idx == 2L, id == 2L, isTRUE(valid)) message("- Connections cannot be serialized") con <- file(tempfile(), open = "w") x <- list(value = 42, stderr = stderr(), con = con) y <- unserialize(serialize(x, connection = NULL)) print(connectionId(x$stderr)) print(connectionId(x$con)) print(isConnectionValid(x$stderr)) print(isConnectionValid(x$con)) print(connectionId(y$stderr)) print(connectionId(y$con)) print(isConnectionValid(y$stderr)) print(isConnectionValid(y$con)) stopifnot( identical(y$value, x$value), connectionId(x$stderr) == 2L, isConnectionValid(x$stderr), isConnectionValid(y$stderr), identical(connectionId(y$stderr), connectionId(x$stderr)), connectionId(x$con) >= 3L, isConnectionValid(x$con), connectionId(y$con) == -1L, !isConnectionValid(y$con) ) close(con) message("*** Connections ... DONE") source("incl/end.R") parallelly/tests/freePort.R0000644000175000017500000000127714075254325015636 0ustar nileshnileshsource("incl/start.R") message("*** freePort() ...") set.seed(42) rng <- .Random.seed for (kk in 1:5) { port <- freePort() message("A random free TCP port: ", port) stopifnot(is.integer(port), length(port) == 1L) if (!is.na(port)) stopifnot(port >= 0L, port <= 65535L) stopifnot(identical(.Random.seed, rng)) } message("- freePort('auto')") Sys.unsetenv("R_PARALLEL_PORT") port <- freePort("auto") message("A random free TCP port: ", port) message("- freePort('auto') with env var R_PARALLEL_PORT = 8888") Sys.setenv(R_PARALLEL_PORT = 8888L) port <- freePort("auto") message("A free TCP port: ", port) stopifnot(port == 8888L) message("*** freePort() ... DONE") source("incl/end.R") parallelly/tests/r_bug18119.R0000644000175000017500000000327614075254325015553 0ustar nileshnileshsource("incl/start.R") r_version_has_bug18119 <- function() { parallelly:::r_version_has_bug18119(force = TRUE) } affected_by_bug18119 <- function() { parallelly:::affected_by_bug18119(force = TRUE) } message("*** R bug #18119 ...") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) Sys.setenv(R_PARALLELLY_R_VERSION="3.5.3") Sys.setenv(R_PARALLELLY_R_REVISION="76217") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(!has) Sys.setenv(R_PARALLELLY_R_VERSION="4.0.0") Sys.setenv(R_PARALLELLY_R_REVISION="78286") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(has) Sys.setenv(R_PARALLELLY_R_VERSION="4.1.0") Sys.setenv(R_PARALLELLY_R_REVISION="80317") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(has) Sys.setenv(R_PARALLELLY_R_VERSION="4.1.0") Sys.setenv(R_PARALLELLY_R_REVISION="80531") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(has) Sys.setenv(R_PARALLELLY_R_VERSION="4.1.0") Sys.setenv(R_PARALLELLY_R_REVISION="80532") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(!has) Sys.setenv(R_PARALLELLY_R_VERSION="4.2.0") Sys.setenv(R_PARALLELLY_R_REVISION="80471") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(has) Sys.setenv(R_PARALLELLY_R_VERSION="4.2.0") Sys.setenv(R_PARALLELLY_R_REVISION="80472") has <- r_version_has_bug18119() message("r_version_has_bug18119(): ", has) stopifnot(!has) message("*** R bug #18119 ... DONE") source("incl/end.R") parallelly/tests/freeCores.R0000644000175000017500000000034414025754625015762 0ustar nileshnileshsource("incl/start.R") message("*** freeLoad() ...") free <- freeCores() print(free) stopifnot( is.integer(free), length(free) == 1L, !is.na(free), free >= 1L ) message("*** freeLoad() ... DONE") source("incl/end.R") parallelly/tests/availableWorkers.R0000644000175000017500000002061214025754625017342 0ustar nileshnileshsource("incl/start.R") message("*** availableWorkers() ...") ## The default w <- availableWorkers() print(w) stopifnot(is.character(w), length(w) >= 1) ## Minimium of all known settings (default) print(availableWorkers(which = "min")) ## Maximum of all known settings (should never be used) print(availableWorkers(which = "max")) ## All known settings print(availableWorkers(na.rm = FALSE, which = "all")) ## System settings w <- availableWorkers(methods = "system") print(w) stopifnot(is.character(w), length(w) >= 1) ## Predefined ones for known cluster schedulers print(availableWorkers(methods = "PBS")) print(availableWorkers(methods = "SGE")) print(availableWorkers(methods = "Slurm")) print(availableWorkers(methods = "LSF")) message("*** HPC related ...") sge_expand_node_count_pairs <- parallelly:::sge_expand_node_count_pairs read_pbs_nodefile <- parallelly:::read_pbs_nodefile read_pe_hostfile <- parallelly:::read_pe_hostfile workers0 <- c("n1", "n2", "n3", "n1", "n6", "n3", "n3", "n5") data0 <- as.data.frame(table(workers0), stringsAsFactors = FALSE) colnames(data0) <- c("node", "count") data0 <- data0[order(data0$node, data0$count), ] message("*** LSF ...") Sys.setenv(LSB_HOSTS = paste(workers0, collapse = " ")) workers <- availableWorkers(methods = "LSF") print(workers) stopifnot(length(workers) == length(workers0)) message("*** LSF ... done") message("*** read_pbs_nodefile() ...") workers <- workers0 pathname <- tempfile() writeLines(workers, con = pathname) data <- read_pbs_nodefile(pathname) str(data) stopifnot( c("node") %in% colnames(data), is.character(data$node), !anyNA(data$node), nrow(data$node) == length(workers), all(sort(data$node) == sort(workers)) ) Sys.setenv(PBS_NODEFILE = pathname) Sys.setenv(PBS_NP = length(workers), PBS_NUM_NODES = length(workers) / 2, PBS_NUM_PPN = 2) workers <- availableWorkers(methods = "PBS") print(workers) stopifnot(length(workers) == length(workers0), all(workers == sort(workers0))) Sys.setenv(PBS_NUM_PPN = 3) res <- tryCatch({ workers <- availableWorkers(methods = "PBS") }, warning = identity) stopifnot(inherits(res, "warning")) Sys.setenv(PBS_NP = length(workers) + 1) res <- tryCatch({ workers <- availableWorkers(methods = "PBS") }, warning = identity) stopifnot(inherits(res, "warning")) ## Exceptions workersE <- c(workers, "n 3") pathname <- tempfile() writeLines(workersE, con = pathname) res <- tryCatch(read_pbs_nodefile(pathname), error = identity) print(res) stopifnot(inherits(res, "error")) Sys.setenv(PBS_NODEFILE = "") res <- tryCatch({ workers <- availableWorkers(methods = "PBS") }, warning = identity) stopifnot(inherits(res, "warning")) message("*** read_pbs_nodefile() ... DONE") message("*** read_pe_hostfile() ...") workers <- workers0 pathname <- tempfile() write.table(data0, file = pathname, quote = FALSE, row.names = FALSE, col.names = FALSE) lines <- readLines(pathname) print(lines) data <- read_pe_hostfile(pathname, expand = FALSE) print(data) stopifnot( is.character(data$node), !anyNA(data$node), is.integer(data$count), !anyNA(data$count), all(is.finite(data$count)), all(data$count > 0), nrow(data) == nrow(data0), all.equal(data[, c("node", "count")], data0[, c("node", "count")]) ) workers <- sge_expand_node_count_pairs(data) stopifnot(length(workers) == length(workers0), all(workers == sort(workers0))) Sys.setenv(PE_HOSTFILE = pathname) Sys.setenv(NSLOTS = length(workers0)) ## Use to validate results workers <- availableWorkers(methods = "SGE") print(workers) stopifnot(length(workers) == length(workers0), all(workers == sort(workers0))) ## Test validation Sys.setenv(NSLOTS = length(workers0) + 1L) workers <- tryCatch(availableWorkers(methods = "SGE"), warning = identity) print(workers) stopifnot(inherits(workers, "warning")) Sys.setenv(PE_HOSTFILE = "") res <- tryCatch({ workers <- availableWorkers(methods = "SGE") }, warning = identity) stopifnot(inherits(res, "warning")) message("*** read_pe_hostfile() ... DONE") message("*** Slurm w/ SLURM_JOB_NODELIST ...") slurm_expand_nodelist <- parallelly:::slurm_expand_nodelist specs <- list( "n1" = c("n1"), " n1" = c("n1"), "n1,, n3" = c("n1", "n3"), "n1, n3" = c("n1", "n3"), "n3 n1" = c("n3", "n1"), "n[1-13]" = sprintf("n%d", c(1:13)), ## scontrol show hostname treats "n[1,3-4, 11-13]" == "n[1,3-4,0011-13]" "n[1,3-4, 11-13]" = c("n1", "n3", "n4", "n0011", "n0012", "n0013"), "a1,b[ 02-04,6-7]" = c("a1", "b00002", "b00003", "b00004", "b6", "b7") ) ## All combined all <- list(unlist(specs, use.names = FALSE)) names(all) <- paste(names(specs), collapse = ",") specs <- c(specs, all) ## Again, all combined but in reverse order all <- list(unlist(rev(specs), use.names = FALSE)) names(all) <- paste(rev(names(specs)), collapse = ",") specs <- c(specs, all) for (kk in seq_along(specs)) { message(sprintf("- Specification #%d of %d", kk, length(specs))) nodelist <- names(specs)[kk] truth <- specs[[kk]] cat(sprintf("nodelist: %s\n", sQuote(nodelist))) expanded <- slurm_expand_nodelist(nodelist, manual = TRUE) cat(sprintf("expanded: c(%s)\n", paste(sQuote(expanded), collapse = ", "))) cat(sprintf("truth: c(%s)\n", paste(sQuote(truth), collapse = ", "))) stopifnot( is.character(expanded), !any(is.na(expanded)), length(expanded) == length(truth), identical(expanded, truth) ) Sys.unsetenv(c("SLURM_JOB_NODELIST", "SLURM_NODELIST", "SLURM_JOB_CPUS_PER_NODE", "SLURM_TASKS_PER_NODE")) ## Test without SLURM_JOB_CPUS_PER_NODE/SLURM_TASKS_PER_NODE Sys.setenv(SLURM_JOB_NODELIST = nodelist) for (name in c("SLURM_JOB_NODELIST", "SLURM_JOB_CPUS_PER_NODE")) { cat(sprintf("%s = %s\n", name, sQuote(Sys.getenv(name)))) } workers <- availableWorkers(methods = "Slurm") cat(sprintf("workers: c(%s)\n", paste(sQuote(workers), collapse = ", "))) stopifnot(identical(workers, truth)) ## Test with SLURM_JOB_CPUS_PER_NODE/SLURM_TASKS_PER_NODE nhosts <- length(expanded) ncores_per_host <- sample(1:10, size = nhosts, replace = TRUE) ## Handle the case when 'nodelist' result in a non-ordered 'expanded' expanded2 <- as.list(expanded) for (kk in seq_along(expanded2)) { expanded2[[kk]] <- rep(expanded2[[kk]], times = ncores_per_host[kk]) } expanded2 <- unlist(expanded2, use.names = FALSE) Sys.setenv(SLURM_JOB_NODELIST = nodelist) Sys.setenv(SLURM_JOB_CPUS_PER_NODE = paste(ncores_per_host, collapse = ",")) for (name in c("SLURM_JOB_NODELIST", "SLURM_JOB_CPUS_PER_NODE")) { cat(sprintf("%s = %s\n", name, sQuote(Sys.getenv(name)))) } workers <- availableWorkers(methods = "Slurm") cat(sprintf("workers: c(%s)\n", paste(sQuote(workers), collapse = ", "))) stopifnot(identical(unique(workers), unique(truth))) counts <- table(workers) counts <- counts[unique(workers)] print(counts) counts2 <- table(expanded2) counts2 <- counts2[unique(expanded2)] print(counts2) stopifnot( sum(counts) == sum(ncores_per_host), sum(counts) == sum(counts2), all(counts == counts2) ) Sys.unsetenv(c("SLURM_JOB_NODELIST", "SLURM_JOB_CPUS_PER_NODE")) } message("*** Slurm w/ SLURM_JOB_NODELIST ... DONE") message("*** Slurm w/ SLURM_TASKS_PER_NODE ...") slurm_expand_nodecounts <- parallelly:::slurm_expand_nodecounts specs <- list( "1" = c(1L), "1,3" = c(1L,3L), "1, 3" = c(1L,3L), "2(x3)" = rep(2L, times = 3L), "2(x3),3,4(x1)" = c(rep(2L, times = 3L), 3L, 4L) ) for (kk in seq_along(specs)) { message(sprintf("- Specification #%d of %d", kk, length(specs))) nodecounts <- names(specs)[kk] truth <- specs[[kk]] cat(sprintf("nodecounts: %s\n", sQuote(nodecounts))) expanded <- slurm_expand_nodecounts(nodecounts) cat(sprintf("expanded: c(%s)\n", paste(sQuote(expanded), collapse = ", "))) cat(sprintf("truth: c(%s)\n", paste(sQuote(truth), collapse = ", "))) stopifnot( is.integer(expanded), !any(is.na(expanded)), length(expanded) == length(truth), identical(expanded, truth) ) } message("*** Slurm w/ SLURM_TASKS_PER_NODE ... DONE") message("*** HPC related ... DONE") ## Any R options and system environment variable print(availableWorkers(methods = c("width", "FOO_BAR_ENV"), na.rm = FALSE, which = "all")) ## Exception handling Sys.setenv("FOO_BAR_ENV" = "0") res <- tryCatch(availableWorkers(methods = "FOO_BAR_ENV"), error = identity) stopifnot(inherits(res, "error")) message("*** availableWorkers() ... DONE") source("incl/end.R") parallelly/tests/cpuLoad.R0000644000175000017500000000051314074124134015420 0ustar nileshnileshsource("incl/start.R") message("*** cpuLoad() ...") loadavg <- cpuLoad() print(loadavg) stopifnot( is.numeric(loadavg), length(loadavg) == 3L, !is.null(names(loadavg)), identical(names(loadavg), c("1min", "5min", "15min")), all(is.na(loadavg) | loadavg >= 0) ) message("*** cpuLoad() ... DONE") source("incl/end.R") parallelly/tests/availableCores.R0000644000175000017500000000404314025754625016761 0ustar nileshnileshsource("incl/start.R") message("*** availableCores() ...") ## detectCores() may return NA_integer_ n <- parallel::detectCores() message(sprintf("detectCores() = %d", n)) stopifnot(length(n) == 1, is.numeric(n)) ## Default n <- availableCores() message(sprintf("availableCores() = %d", n)) stopifnot(length(n) == 1, is.numeric(n), n >= 1) ## Minimium of all known settings (default) print(availableCores(which = "min")) ## Maximum of all known settings (should never be used) print(availableCores(which = "max")) ## All known settings print(availableCores(na.rm = FALSE, which = "all")) ## System settings n <- availableCores(methods = "system") print(n) stopifnot(length(n) == 1, is.numeric(n), is.finite(n), n >= 1) ## Predefined ones for known cluster schedulers print(availableCores(methods = "PBS")) print(availableCores(methods = "SGE")) print(availableCores(methods = "Slurm")) print(availableCores(methods = "LSF")) ## Any R options and system environment variable print(availableCores(methods = c("width", "FOO_BAR_ENV"), na.rm = FALSE, which = "all")) ## Exception handling Sys.setenv("FOO_BAR_ENV" = "0") res <- try(availableCores(methods = "FOO_BAR_ENV"), silent = TRUE) stopifnot(inherits(res, "try-error")) ncores0 <- 42L message("*** LSF ...") Sys.setenv(LSB_DJOB_NUMPROC = as.character(ncores0)) ncores <- availableCores(methods = "LSF") print(ncores) stopifnot(ncores == ncores0) message("*** LSF ... done") message("*** Internal detectCores() ...") ## Option 'parallelly.availableCores.system' ## Reset internal cache env <- environment(parallelly:::detectCores) env$cache <- list() options(parallelly.availableCores.system = 2L) n <- detectCores() print(n) stopifnot(is.integer(n), is.finite(n), n >= 1, n == 2L) options(parallelly.availableCores.system = NULL) ## Reset env <- environment(parallelly:::detectCores) env$cache <- list() n <- detectCores() print(n) stopifnot(is.integer(n), is.finite(n), n >= 1) message("*** Internal detectCores() ... DONE") message("*** availableCores() ... DONE") source("incl/end.R") parallelly/tests/makeClusterMPI.R0000644000175000017500000000107114025754625016670 0ustar nileshnileshsource("incl/start.R") library(parallel) message("*** makeClusterMPI() ...") pkg <- "Rmpi" if (fullTest && requireNamespace(pkg, quietly = TRUE)) { cl <- makeClusterMPI(2L) str(cl) res <- parLapply(cl, X = 1:2, fun = function(x) { list( hostname = Sys.info()[["nodename"]], pid = Sys.getpid(), value = x^2 ) }) utils::str(res) y <- vapply(res, FUN = `[[`, "value", FUN.VALUE = NA_real_) stopifnot(identical(y, c(1, 4))) stopCluster(cl) str(cl) } message("*** makeClusterMPI() ... DONE") source("incl/end.R") parallelly/tests/isForkedChild.R0000644000175000017500000000201114116376243016545 0ustar nileshnileshsource("incl/start.R") library(parallel) options(parallelly.debug = FALSE) message("*** isForkedChild() ...") stopifnot(!isForkedChild()) if (supportsMulticore()) { message("- mcparallel()/mccollect()") f <- mcparallel(isForkedChild()) isForked <- mccollect(f)[[1]] stopifnot(isForked) message("- makeForkCluster()") cl <- makeForkCluster(1L) isForked <- clusterEvalQ(cl, { parallelly::isForkedChild() }) isForked <- unlist(isForked, use.names = FALSE) stopifnot(isForked) parallel::stopCluster(cl) } message("- mclapply()") isForked <- mclapply(1:2, FUN = function(ii) isForkedChild()) isForked <- unlist(isForked, use.names = FALSE) if (supportsMulticore()) { stopifnot(all(isForked)) } else { stopifnot(!any(isForked)) } message("- makeClusterPSOCK()") cl <- makeClusterPSOCK(1L) isForked <- clusterEvalQ(cl, { parallelly::isForkedChild() }) isForked <- unlist(isForked, use.names = FALSE) stopifnot(!isForked) parallel::stopCluster(cl) message("*** isForkedChild() ... DONE") source("incl/end.R") parallelly/tests/incl/0000755000175000017500000000000014060224156014634 5ustar nileshnileshparallelly/tests/incl/start,load-only.R0000644000175000017500000000335014060224156020010 0ustar nileshnilesh## Record original state ovars <- ls() oenvs <- oenvs0 <- Sys.getenv() oopts0 <- options() covr_testing <- ("covr" %in% loadedNamespaces()) on_solaris <- grepl("^solaris", R.version$os) on_macos <- grepl("^darwin", R.version$os) on_githubactions <- as.logical(Sys.getenv("GITHUB_ACTIONS", "FALSE")) ## Default options oopts <- options( warn = 1L, mc.cores = 2L, parallelly.slurm_expand_nodelist.manual = TRUE, parallelly.debug = TRUE ) ## Comment: The below should be set automatically whenever the package is ## loaded and 'R CMD check' runs. The below is added in case R is changed ## in the future and we fail to detect 'R CMD check'. Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_CONNECTTIMEOUT = 2 * 60) Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_TIMEOUT = 2 * 60) Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_SESSIONINFO_PKGS = TRUE) ## Label PSOCK cluster workers (to help troubleshooting) test_script <- grep("[.]R$", commandArgs(), value = TRUE)[1] if (is.na(test_script)) test_script <- "UNKNOWN" worker_label <- sprintf("parallelly/tests/%s:%s:%s:%s", test_script, Sys.info()[["nodename"]], Sys.info()[["user"]], Sys.getpid()) Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_RSCRIPT_LABEL = worker_label) fullTest <- (Sys.getenv("_R_CHECK_FULL_") != "") isWin32 <- (.Platform$OS.type == "windows" && .Platform$r_arch == "i386") useXDR <- parallelly:::getOption2("parallelly.makeNodePSOCK.useXDR", FALSE) ## Private functions detectCores <- parallelly:::detectCores hpaste <- parallelly:::hpaste inRCmdCheck <- parallelly:::inRCmdCheck mdebug <- parallelly:::mdebug mdebugf <- parallelly:::mdebugf pid_exists <- parallelly:::pid_exists isFALSE <- parallelly:::isFALSE isNA <- parallelly:::isNA ## Local functions for test scripts printf <- function(...) cat(sprintf(...)) parallelly/tests/incl/end.R0000644000175000017500000000250414025754625015540 0ustar nileshnilesh## Undo options ## (a) Added added <- setdiff(names(options()), names(oopts0)) opts <- vector("list", length = length(added)) names(opts) <- added options(opts) ## (b) Modified options(oopts) ## (c) Removed removed <- setdiff(names(oopts0), names(options())) opts <- oopts0[removed] options(opts) ## (d) Assert that everything was undone stopifnot(identical(options(), oopts0)) ## Undo system environment variables ## (a) Added cenvs <- Sys.getenv() added <- setdiff(names(cenvs), names(oenvs0)) for (name in added) Sys.unsetenv(name) ## (b) Missing missing <- setdiff(names(oenvs0), names(cenvs)) if (length(missing) > 0) do.call(Sys.setenv, as.list(oenvs0[missing])) ## (c) Modified? for (name in intersect(names(cenvs), names(oenvs0))) { ## WORKAROUND: On Linux Wine, base::Sys.getenv() may ## return elements with empty names. /HB 2016-10-06 if (nchar(name) == 0) next if (!identical(cenvs[[name]], oenvs0[[name]])) { do.call(Sys.setenv, as.list(oenvs0[name])) } } ## (d) Assert that everything was undone stopifnot(identical(Sys.getenv(), oenvs0)) ## Undo variables rm(list = c(setdiff(ls(), ovars))) ## Travis CI specific: Explicit garbage collection because it ## looks like Travis CI might run out of memory during 'covr' ## testing and we now have so many tests. /HB 2017-01-11 if ("covr" %in% loadedNamespaces()) gc() parallelly/tests/incl/start.R0000644000175000017500000000006514025754625016127 0ustar nileshnileshlibrary(parallelly) source("incl/start,load-only.R") parallelly/tests/makeClusterPSOCK.R0000644000175000017500000001723214156550333017123 0ustar nileshnileshsource("incl/start.R") is_fqdn <- parallelly:::is_fqdn is_ip_number <- parallelly:::is_ip_number is_localhost <- parallelly:::is_localhost find_rshcmd <- parallelly:::find_rshcmd message("*** makeClusterPSOCK() ...") message("- makeClusterPSOCK() - internal utility functions") stopifnot( is_fqdn("a.b"), is_fqdn("a.b.c"), !is_fqdn("a") ) stopifnot( is_ip_number("1.2.3.4"), !is_ip_number("a"), !is_ip_number("1.2.3"), !is_ip_number("1.2.3.256"), !is_ip_number("1.2.3.-1"), !is_ip_number("1.2.3.a") ) ## Reset internal cache stopifnot(is.na(is_localhost(worker = NULL, hostname = NULL))) stopifnot( is_localhost("localhost"), is_localhost("127.0.0.1"), is_localhost(Sys.info()[["nodename"]]), is_localhost(Sys.info()[["nodename"]]), ## cache hit !is_localhost("not.a.localhost.hostname") ) cmd <- find_rshcmd(must_work = FALSE) print(cmd) message("- makeClusterPSOCK()") cl <- makeClusterPSOCK("", user = "johndoe", master = NULL, revtunnel = FALSE, rshcmd = "my_ssh", renice = TRUE, manual = TRUE, dryrun = TRUE) print(cl) cl <- makeClusterPSOCK(1L, port = "random", dryrun = TRUE) print(cl) cl <- makeClusterPSOCK(1L) print(cl) parallel::stopCluster(cl) message("- makeClusterPSOCK() - useXDR = TRUE/FALSE") for (xdr in c(TRUE, FALSE)) { cl <- makeClusterPSOCK(1L, useXDR = xdr) node <- cl[[1]] stopifnot( is.list(node), inherits(node, if (xdr) "SOCKnode" else "SOCK0node"), "session_info" %in% names(node) ) si <- node[["session_info"]] stopifnot(is.list(si)) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - argument 'socketOptions'") for (value in list(NULL, "NULL", "no-delay")) { cl <- makeClusterPSOCK(1L, socketOptions = value) y <- parallel::clusterEvalQ(cl, 42L)[[1]] stopifnot(identical(y, 42L)) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - argument 'rscript_startup'") for (value in list(NULL, "options(abc = 42L)", quote(options(abc = 42L)))) { cl <- makeClusterPSOCK(1L, rscript_startup = value) y <- parallel::clusterEvalQ(cl, getOption("abc", NA_integer_))[[1]] stopifnot(is.integer(y), length(y) == 1L) if (!is.null(value)) stopifnot(identical(y, 42L)) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - setup_strategy = TRUE/FALSE") for (setup_strategy in c("sequential", "parallel")) { dt <- system.time({ cl <- makeClusterPSOCK(2L, setup_strategy = setup_strategy) }) print(dt) print(cl) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - with and w/out validation") cl <- makeClusterPSOCK(1L, validate = TRUE) ## default node <- cl[[1]] stopifnot( is.list(node), inherits(node, if (useXDR) "SOCKnode" else "SOCK0node"), "session_info" %in% names(node) ) si <- node[["session_info"]] stopifnot(is.list(si)) parallel::stopCluster(cl) cl <- makeClusterPSOCK(1L, validate = FALSE) node <- cl[[1]] stopifnot( is.list(node), inherits(node, if (useXDR) "SOCKnode" else "SOCK0node"), ! "session_info" %in% names(node) ) parallel::stopCluster(cl) message("- makeClusterPSOCK() - w/out 'parallelly' on worker") ovalue <- Sys.getenv("R_LIBS_USER") Sys.setenv(R_LIBS_USER = tempdir()) cl <- makeClusterPSOCK(1L, outfile = "") print(cl) parallel::stopCluster(cl) Sys.setenv(R_LIBS_USER = ovalue) message("- makeClusterPSOCK() - assert 'parallelly' is not loaded") cl <- makeClusterPSOCK(1L) ns <- parallel::clusterCall(cl, function() { loadedNamespaces() }) print(ns) stopifnot(!is.element("parallelly", ns)) parallel::stopCluster(cl) message("- makeClusterPSOCK() - launch via the R executable") if (.Platform$OS.type == "windows") { ## R and R.exe fails on MS Windows, cf. R-devel thread "MS Windows: R does ## not escape quotes in CLI options the same way as Rterm and Rscript" ## on 2021-12-15. rscripts <- file.path(R.home("bin"), c("Rterm", "Rterm.exe")) } else { rscripts <- file.path(R.home("bin"), "R") } for (rscript in rscripts) { message(" Launcher: ", sQuote(rscript)) rscript_args <- c("--no-echo", "--no-restore", "*", "--args") cl <- tryCatch({ makeClusterPSOCK(1L, rscript = rscript, rscript_args = rscript_args) }, warning = identity) stopifnot(inherits(cl, "cluster")) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - default packages") if (.Platform$OS.type == "windows") { ## R and R.exe fails on MS Windows, cf. R-devel thread "MS Windows: R does ## not escape quotes in CLI options the same way as Rterm and Rscript" ## on 2021-12-15. rscripts <- file.path(R.home("bin"), c("Rscript", "Rterm", "Rterm.exe")) } else { rscripts <- file.path(R.home("bin"), c("Rscript", "R")) } default_packages <- c("utils", "tools") for (rscript in rscripts) { message(" Launcher: ", sQuote(rscript)) if (tools::file_path_sans_ext(basename(rscript)) %in% c("R", "Rterm")) { rscript_args <- c("--no-echo", "--no-restore", "*", "--args") } else { rscript_args <- NULL } cl <- tryCatch({ makeClusterPSOCK(1L, rscript = rscript, rscript_args = rscript_args, default_packages = default_packages) }, warning = identity) stopifnot(inherits(cl, "cluster")) pkgs <- parallel::clusterEvalQ(cl, { getOption("defaultPackages") })[[1]] stopifnot(identical(pkgs, default_packages)) parallel::stopCluster(cl) } if (.Platform$OS.type == "windows") { ## R and R.exe fails on MS Windows, cf. R-devel thread "MS Windows: R does ## not escape quotes in CLI options the same way as Rterm and Rscript" ## on 2021-12-15. rscripts <- file.path(R.home("bin"), c("Rscript", "Rterm", "Rterm.exe")) } else { rscripts <- file.path(R.home("bin"), c("Rscript", "R")) } default_packages <- c("parallelly", "*") truth <- unique(c("parallelly", getOption("defaultPackages"))) for (rscript in rscripts) { message(" Launcher: ", sQuote(rscript)) if (tools::file_path_sans_ext(basename(rscript)) %in% c("R", "Rterm")) { rscript_args <- c("--no-echo", "--no-restore", "*", "--args") } else { rscript_args <- NULL } cl <- tryCatch({ makeClusterPSOCK(1L, rscript = rscript, rscript_args = rscript_args, default_packages = default_packages) }, warning = identity) stopifnot(inherits(cl, "cluster")) pkgs <- parallel::clusterEvalQ(cl, { getOption("defaultPackages") })[[1]] stopifnot(identical(pkgs, truth)) parallel::stopCluster(cl) } message("- makeClusterPSOCK() - exceptions") res <- tryCatch({ cl <- makeClusterPSOCK(1:2) }, error = identity) print(res) stopifnot(inherits(res, "error")) res <- tryCatch({ cl <- makeClusterPSOCK(0L) }, error = identity) print(res) stopifnot(inherits(res, "error")) res <- tryCatch({ cl <- makeClusterPSOCK(1L, rshcmd = character(0L)) }, error = identity) print(res) stopifnot(inherits(res, "error")) res <- tryCatch({ cl <- makeClusterPSOCK(1L, port = integer(0L)) }, error = identity) print(res) stopifnot(inherits(res, "error")) res <- tryCatch({ cl <- makeClusterPSOCK(1L, port = NA_integer_) }, error = identity) print(res) stopifnot(inherits(res, "error")) message("- makeClusterPSOCK() - exceptions") ## Non-existing hostname res <- tryCatch({ cl <- makeNodePSOCK("not.a.localhost.hostname", revtunnel = TRUE) }, error = identity) print(res) stopifnot(inherits(res, "error")) ## Invalid port res <- tryCatch({ cl <- makeNodePSOCK("localhost", port = NA_integer_) }, error = identity) print(res) stopifnot(inherits(res, "error")) ## Don't test on CRAN if (fullTest || covr_testing) { ## Occupied/blocked port res <- tryCatch( cl <- parallelly::makeClusterPSOCK("localhost", port = 80L, tries = 1L), error = identity) print(res) ## Skip error assertion in case this actually works on some machine. ## But where it fails, we are testing the port-failure exception code. } message("*** makeClusterPSOCK() ... DONE") source("incl/end.R") parallelly/tests/as.cluster.R0000644000175000017500000000357614067004236016132 0ustar nileshnileshsource("incl/start.R") stopCluster <- parallel::stopCluster message("*** cluster operations ...") local({ cl0 <- makeClusterPSOCK(1L) on.exit(stopCluster(cl0)) cl <- cl0 print(cl) message("*** cluster operations - as.cluster() ...") cl1 <- as.cluster(cl) print(cl1) stopifnot(inherits(cl1, "cluster"), identical(cl1, cl)) node <- cl[[1]] print(node) cl2 <- as.cluster(node) stopifnot(inherits(cl2, "cluster"), length(cl2) == 1L, identical(cl2[[1]], node)) node <- cl[[1]] print(node) stopifnot(inherits(node, if (useXDR) "SOCKnode" else "SOCK0node")) nodes <- list(node, node) cl3 <- as.cluster(node) print(cl3) stopifnot(inherits(cl3, "cluster"), length(cl3) == 1L, identical(cl3[[1]], node)) cl4 <- as.cluster(nodes) print(cl4) stopifnot(inherits(cl4, "cluster"), length(cl4) == 2L, identical(cl4[[1]], node), identical(cl4[[2]], node)) message("*** cluster operations - as.cluster() ... DONE") message("*** cluster operations - c(...) ...") cl2 <- makeClusterPSOCK("localhost") on.exit(stopCluster(cl2), add = TRUE) print(cl2) cl <- c(cl1, cl2) print(cl) stopifnot(inherits(cl, "cluster"), length(cl) == 2L) stopifnot(identical(cl[1], cl1), identical(cl[2], cl2[1])) message("*** cluster operations - c(...) ... DONE") }) message("*** cluster operations - makeClusterPSOCK(remotes) ...") remotes <- Sys.getenv("R_PARALLELLY_TESTS_REMOTES", "") remotes <- gsub(" ", "", unlist(strsplit(remotes, split = ","))) remotes <- remotes[nzchar(remotes)] if (length(remotes) > 0) { message("Remotes: ", paste(sQuote(remotes), collapse = ", ")) local({ cl <- makeClusterPSOCK(remotes, verbose = TRUE) on.exit(stopCluster(cl)) print(cl) }) } message("*** cluster operations - makeClusterPSOCK(remotes) ... DONE") message("*** cluster operations ... DONE") source("incl/end.R") parallelly/tests/utils.R0000644000175000017500000000455714057604467015222 0ustar nileshnileshsource("incl/start,load-only.R") message("*** utils ...") message("*** hpaste() ...") # Some vectors x <- 1:6 y <- 10:1 z <- LETTERS[x] # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Abbreviation of output vector # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - printf("x = %s.\n", hpaste(x)) ## x = 1, 2, 3, ..., 6. printf("x = %s.\n", hpaste(x, maxHead = 2)) ## x = 1, 2, ..., 6. printf("x = %s.\n", hpaste(x, maxHead = 3)) # Default ## x = 1, 2, 3, ..., 6. # It will never output 1, 2, 3, 4, ..., 6 printf("x = %s.\n", hpaste(x, maxHead = 4)) ## x = 1, 2, 3, 4, 5 and 6. # Showing the tail printf("x = %s.\n", hpaste(x, maxHead = 1, maxTail = 2)) ## x = 1, ..., 5, 6. # Turning off abbreviation printf("y = %s.\n", hpaste(y, maxHead = Inf)) ## y = 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 ## ...or simply printf("y = %s.\n", paste(y, collapse = ", ")) ## y = 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 # Change last separator printf("x = %s.\n", hpaste(x, lastCollapse = " and ")) ## x = 1, 2, 3, 4, 5 and 6. # No collapse stopifnot(all(hpaste(x, collapse = NULL) == x)) # Empty input stopifnot(identical(hpaste(character(0)), character(0))) message("*** hpaste() ... DONE") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # debug() # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - message("*** mdebug() ...") mdebug("Hello #", 1) mdebugf("Hello #%d", 1) options(parallelly.debug = TRUE) mdebug("Hello #", 2) mdebugf("Hello #%d", 2) options(parallelly.debug = FALSE) mdebug("Hello #", 3) mdebugf("Hello #%d", 3) message("*** mdebug() ... DONE") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # pid_exists() # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - message("*** pid_exists() ...") options(parallelly.debug = TRUE) pid <- Sys.getpid() printf("Current PID: %d\n", pid) exists <- pid_exists(pid) printf("Does it exist: %s\n", exists) ## Either pid_exists() works and return TRUE here, or it fails ## to query the process information at all in case it returns NA ## However, it should never return FALSE. stopifnot(is.logical(exists), length(exists) == 1L, isTRUE(exists) || is.na(exists)) message("*** pid_exists() ... DONE") message("*** inRCmdCheck() ...") cat(sprintf("R CMD check is running: %s\n", inRCmdCheck())) message("*** inRCmdCheck() ... DONE") message("*** utils ... DONE") source("incl/end.R") parallelly/tests/startup.R0000644000175000017500000000543714057523202015545 0ustar nileshnileshsource("incl/start.R") pkgname <- "parallelly" .onLoad <- parallelly:::.onLoad maxCores <- min(2L, availableCores(methods = "system")) message("*** .onLoad() ...") message("- .onLoad() w/ parallelly.availableCores.system = 1L ...") options(parallelly.availableCores.system = 1L) .onLoad(pkgname, pkgname) options(parallelly.availableCores.system = NULL) message("- .onLoad() w/ parallelly.availableCores.system = 1L ... DONE") message("- .onLoad() w/ R_PARALLELLY_AVAILABLECORES_SYSTEM ...") Sys.setenv(R_PARALLELLY_AVAILABLECORES_SYSTEM = "1") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.system") print(ncores) stopifnot(is.integer(ncores), ncores == 1L) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_SYSTEM") options(parallelly.availableCores.system = NULL) Sys.setenv(R_PARALLELLY_AVAILABLECORES_SYSTEM = "NA") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.system") print(ncores) stopifnot(is.integer(ncores), is.na(ncores)) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_SYSTEM") options(parallelly.availableCores.system = NULL) Sys.setenv(R_PARALLELLY_AVAILABLECORES_SYSTEM = "NA_real_") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.system") print(ncores) stopifnot(is.integer(ncores), is.na(ncores)) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_SYSTEM") options(parallelly.availableCores.system = NULL) message("- .onLoad() w/ R_PARALLELLY_AVAILABLECORES_SYSTEM ... DONE") message("- .onLoad() w/ parallelly.availableCores.fallback = 1L ...") options(parallelly.availableCores.fallback = 1L) .onLoad(pkgname, pkgname) options(parallelly.availableCores.fallback = NULL) message("- .onLoad() w/ parallelly.availableCores.fallback = 1L ... DONE") message("- .onLoad() w/ R_PARALLELLY_AVAILABLECORES_FALLBACK ...") Sys.setenv(R_PARALLELLY_AVAILABLECORES_FALLBACK = "1") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.fallback") print(ncores) stopifnot(is.integer(ncores), ncores == 1L) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_FALLBACK") options(parallelly.availableCores.fallback = NULL) Sys.setenv(R_PARALLELLY_AVAILABLECORES_FALLBACK = "NA") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.fallback") print(ncores) stopifnot(is.integer(ncores), is.na(ncores)) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_FALLBACK") options(parallelly.availableCores.fallback = NULL) Sys.setenv(R_PARALLELLY_AVAILABLECORES_FALLBACK = "NA_real_") .onLoad(pkgname, pkgname) ncores <- getOption("parallelly.availableCores.fallback") print(ncores) stopifnot(is.integer(ncores), is.na(ncores)) Sys.unsetenv("R_PARALLELLY_AVAILABLECORES_FALLBACK") options(parallelly.availableCores.fallback = NULL) message("- .onLoad() w/ R_PARALLELLY_AVAILABLECORES_FALLBACK ... DONE") message("*** .onLoad() ... DONE") source("incl/end.R") parallelly/R/0000755000175000017500000000000014156454104012752 5ustar nileshnileshparallelly/R/getOptionOrEnvVar.R0000644000175000017500000000433414060224156016470 0ustar nileshnilesh#' Gets an R Option or an Environment Variable #' #' @param name (character string) The name of the \R option. #' #' @param default (a single object) The value to be returned if neither #' the \R option nor the environment variable is set. If the environment #' variable is set, its value is coerced to the same type as `default`. #' @param envvar (character string) The name of the environment variable. #' If not set, or NULL, then the name is automatically constructed from #' the upper-case version of argument `name` with periods (`.`) substituted #' by underscores (`_`) and prefixed with `R_`, e.g. with `"abc.debug"` #' becomes `R_ABC_DEBUG`. #' #' @return #' Returns an object. #' #' @keywords internal getOption2 <- local({ re <- sprintf("^(future|%s)[.]", .packageName) prefixes <- paste(c(.packageName, "future"), ".", sep = "") function(name, default = NULL) { value <- getOption(name, NULL) if (!is.null(value)) return(value) ## Backward compatibility with the 'future' package basename <- sub(re, "", name) names <- unique(c(name, paste(prefixes, basename, sep=""))) ## Is there an R option set? for (name in names) { value <- getOption(name, NULL) if (!is.null(value)) return(value) } default } }) getEnvVar2 <- local({ re <- sprintf("^R_(FUTURE|%s)_", toupper(.packageName)) prefixes <- paste("R_", toupper(c(.packageName, "future")), "_", sep = "") function(name, default = NA_character_) { value <- Sys.getenv(name, "") if (nzchar(value)) return(value) ## Backward compatibility with the 'future' package basename <- sub(re, "", name) names <- unique(c(name, paste(prefixes, basename, sep=""))) ## Is there an environment variable set? for (name in names) { value <- Sys.getenv(name, "") if (nzchar(value)) return(value) } default } }) ## When 'default' is specified, this is 30x faster than ## base::getOption(). The difference is that here we use ## use names(.Options) whereas in 'base' names(options()) ## is used. getOption <- local({ go <- base::getOption function(x, default = NULL) { if (missing(default) || match(x, table = names(.Options), nomatch = 0L) > 0L) go(x) else default } }) parallelly/R/options.R0000644000175000017500000003716614146362545014613 0ustar nileshnilesh#' Options Used by the 'parallelly' Package #' #' Below are the \R options and environment variables that are used by the #' \pkg{parallelly} package and packages enhancing it.\cr #' \cr #' _WARNING: Note that the names and the default values of these options may #' change in future versions of the package. Please use with care until #' further notice._ #' #' @section Backward compatibility with the \pkg{future} package: #' #' The functions in the \pkg{parallelly} package originates from the #' \pkg{future} package. Because they are widely used within the future #' ecosystem, we need to keep them backward compatible for quite a long time, #' in order for all existing packages and R scripts to have time to adjust. #' This also goes for the \R options and the environment variables used to #' configure these functions. #' All options and environment variables used here have prefixes `parallelly.` #' and `R_PARALLELLY_`, respectively. Because of the backward compatibility #' with the \pkg{future} package, the same settings can also be controlled #' by options and environment variables with prefixes `future.` and #' `R_FUTURE_` until further notice, e.g. setting option #' \option{future.availableCores.fallback=1} is the same as setting option #' \option{parallelly.availableCores.fallback=1}, and setting environment #' variable \env{R_FUTURE_AVAILABLECORES_FALLBACK=1} is the same as setting #' \env{R_PARALLELLY_AVAILABLECORES_FALLBACK=1}. #' #' #' @section Configuring number of parallel workers: #' #' The below \R options and environment variables control the default results of [availableCores()] and [availableWorkers()]. #' #' \describe{ #' \item{\option{parallelly.availableCores.logical}:}{(logical) The default value of argument `logical` as used by `availableCores()`, `availableWorkers()`, and `availableCores()` for querying `parallel::detectCores(logical = logical)`. The default is `TRUE` just like it is for [parallel::detectCores()].} #' #' \item{\option{parallelly.availableCores.methods}:}{(character vector) Default lookup methods for [availableCores()]. (Default: `c("system", "nproc", "mc.cores", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "fallback", "custom")`)} #' #' \item{\option{parallelly.availableCores.custom}:}{(function) If set and a function, then this function will be called (without arguments) by [availableCores()] where its value, coerced to an integer, is interpreted as a number of cores.} #' #' \item{\option{parallelly.availableCores.fallback}:}{(integer) Number of cores to use when no core-specifying settings are detected other than `"system"` and `"nproc"`. This options makes it possible to set the default number of cores returned by `availableCores()` / `availableWorkers()` yet allow users and schedulers to override it. In multi-tenant environment, such as HPC clusters, it is useful to set environment variable \env{R_PARALLELLY_AVAILABLECORES_FALLBACK} to `1`, which will set this option when the package is loaded.} #' #' \item{\option{parallelly.availableCores.system}:}{(integer) Number of "system" cores used instead of what is reported by \code{\link{availableCores}(which = "system")}. This option allows you to effectively override what `parallel::detectCores()` reports the system has.} #' #' \item{\option{parallelly.availableCores.omit}:}{(integer) Number of cores to set aside, i.e. not to include.} #' #' \item{\option{parallelly.availableWorkers.methods}:}{(character vector) Default lookup methods for [availableWorkers()]. (Default: `c("mc.cores", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "custom", "system", "fallback")`)} #' #' \item{\option{parallelly.availableWorkers.custom}:}{(function) If set and a function, then this function will be called (without arguments) by [availableWorkers()] where its value, coerced to a character vector, is interpreted as hostnames of available workers.} #' } #' #' #' @section Configuring forked parallel processing: #' #' The below \R options and environment variables control the default result of [supportsMulticore()]. #' #' \describe{ #' \item{\option{parallelly.fork.enable}:}{(logical) Enable or disable _forked_ processing. If `FALSE`, multicore futures becomes sequential futures. If `NA`, or not set (the default), the a set of best-practices rules decide whether should be supported or not.} #' #' \item{\option{parallelly.supportsMulticore.unstable}:}{(character) Controls whether a warning should be produced or not whenever multicore processing is automatically disabled because the environment in which R runs is considered unstable for forked processing, e.g. in the RStudio environment. If `"warn"` (default), then an informative warning is produces the first time 'multicore' or 'multiprocess' futures are used. If `"quiet"`, no warning is produced.} #' } #' #' #' @section Configuring setup of parallel PSOCK clusters: #' #' The below \R options and environment variables control the default results of [makeClusterPSOCK()] and its helper function [makeNodePSOCK()] that creates the individual cluster nodes. #' #' \describe{ #' \item{\option{parallelly.makeNodePSOCK.setup_strategy}:}{(character) If `"parallel"` (default), the PSOCK cluster nodes are set up concurrently. If `"sequential"`, they are set up sequentially.} #' #' \item{\option{parallelly.makeNodePSOCK.validate}:}{(logical) If TRUE (default), after the nodes have been created, they are all validated that they work by inquiring about their session information, which is saved in attribute `session_info` of each node.} #' #' \item{\option{parallelly.makeNodePSOCK.connectTimeout}:}{(numeric) The maximum time (in seconds) allowed for each socket connection between the master and a worker to be established (defaults to 2*60 seconds = 2 minutes).} #' #' \item{\option{parallelly.makeNodePSOCK.timeout}:}{(numeric) The maximum time (in seconds) allowed to pass without the master and a worker communicate with each other (defaults to 30*24*60*60 seconds = 30 days).} #' #' \item{\option{parallelly.makeNodePSOCK.useXDR}:}{(logical) If FALSE (default), the communication between master and workers, which is binary, will use small-endian (faster), otherwise big-endian ("XDR"; slower).} #' #' \item{\option{parallelly.makeNodePSOCK.socketOptions}:}{(character string) If set to another value than `"NULL"`, then option \option{socketOptions} is set to this value on the workers during startup. See [base::socketConnection()] for details. (defaults to `"no-delay"`)} #' #' \item{\option{parallelly.makeNodePSOCK.rshcmd}:}{(character vector) The command to be run on the master to launch a process on another host.} #' #' \item{\option{parallelly.makeNodePSOCK.rshopts}:}{(character vector) Addition command-line options appended to `rshcmd`. These arguments are only applied when connecting to non-localhost machines.} #' #' \item{\option{parallelly.makeNodePSOCK.tries}:}{(integer) The maximum number of attempts done to launch each node. Only used when setting up cluster nodes using the sequential strategy.} #' #' \item{\option{parallelly.makeNodePSOCK.tries.delay}:}{(numeric) The number of seconds to wait before trying to launch a cluster node that failed to launch previously. Only used when setting up cluster nodes using the sequential strategy.} #' } #' #' #' @section Options for debugging: #' #' \describe{ #' \item{\option{parallelly.debug}:}{(logical) If `TRUE`, extensive debug messages are generated. (Default: `FALSE`)} #' } #' #' #' @section Environment variables that set R options: #' All of the above \R \option{parallelly.*} options can be set by #' corresponding environment variables \env{R_PARALLELLY_*} _when the #' \pkg{parallelly} package is loaded_. #' For example, if `R_PARALLELLY_MAKENODEPSOCK_SETUP_STRATEGY = "sequential"`, #' then option \option{parallelly.makeNodePSOCK.setup_strategy} is set to #' `"sequential"` (character). #' Similarly, if `R_PARALLELLY_AVAILABLECORES_FALLBACK = "1"`, then option #' \option{parallelly.availableCores.fallback} is set to `1` (integer). #' #' #' @examples #' # Set an R option: #' options(parallelly.availableCores.fallback = 1L) #' #' #' @seealso #' To set \R options when \R starts (even before the \pkg{parallelly} package is loaded), see the \link[base]{Startup} help page. The \href{https://cran.r-project.org/package=startup}{\pkg{startup}} package provides a friendly mechanism for configuring \R's startup process. #' #' @aliases #' parallelly.debug #' #' parallelly.availableCores.custom #' parallelly.availableCores.methods #' parallelly.availableCores.fallback #' parallelly.availableCores.omit #' parallelly.availableCores.system #' parallelly.availableWorkers.methods #' parallelly.availableWorkers.custom #' parallelly.fork.enable #' parallelly.supportsMulticore.unstable #' R_PARALLELLY_AVAILABLECORES_FALLBACK #' R_PARALLELLY_AVAILABLECORES_OMIT #' R_PARALLELLY_AVAILABLECORES_SYSTEM #' R_PARALLELLY_FORK_ENABLE #' R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE #' #' future.availableCores.custom #' future.availableCores.methods #' future.availableCores.fallback #' future.availableCores.system #' future.availableWorkers.methods #' future.availableWorkers.custom #' future.fork.enable #' future.supportsMulticore.unstable #' R_FUTURE_AVAILABLECORES_FALLBACK #' R_FUTURE_AVAILABLECORES_SYSTEM #' R_FUTURE_FORK_ENABLE #' R_FUTURE_SUPPORTSMULTICORE_UNSTABLE #' #' parallelly.makeNodePSOCK.setup_strategy #' parallelly.makeNodePSOCK.validate #' parallelly.makeNodePSOCK.connectTimeout #' parallelly.makeNodePSOCK.timeout #' parallelly.makeNodePSOCK.useXDR #' parallelly.makeNodePSOCK.socketOptions #' parallelly.makeNodePSOCK.rshcmd #' parallelly.makeNodePSOCK.rshopts #' parallelly.makeNodePSOCK.tries #' parallelly.makeNodePSOCK.tries.delay #' R_PARALLELLY_MAKENODEPSOCK.SETUP_STRATEGY #' R_PARALLELLY_MAKENODEPSOCK.VALIDATE #' R_PARALLELLY_MAKENODEPSOCK.CONNECTTIMEOUT #' R_PARALLELLY_MAKENODEPSOCK.TIMEOUT #' R_PARALLELLY_MAKENODEPSOCK.USEXDR #' R_PARALLELLY_MAKENODEPSOCK.SOCKETOPTIONS #' R_PARALLELLY_MAKENODEPSOCK.RSHCMD #' R_PARALLELLY_MAKENODEPSOCK.RSHOPTS #' R_PARALLELLY_MAKENODEPSOCK.TRIES #' R_PARALLELLY_MAKENODEPSOCK.TRIES.DELAY #' ## Internal options and environment variables _not_ documented here: ## parallelly.localhost.hostname ## parallelly.makeNodePSOCK.master.localhost.hostname ## parallelly.makeNodePSOCK.rscript_label ## parallelly.makeNodePSOCK.sessionInfo.pkgs ## parallelly.makeNodePSOCK.autoKill ## parallelly.makeNodePSOCK.port.increment ## parallelly.makeNodePSOCK.tries.port ## R_PARALLELLY_LOCALHOST_HOSTNAME ## R_PARALLELLY_MAKENODEPSOCK_MASTER_LOCALHOST_HOSTNAME ## R_PARALLELLY_MAKENODEPSOCK_RSCRIPT_LABEL ## R_PARALLELLY_MAKENODEPSOCK_SESSIONINFO_PKGS ## R_PARALLELLY_MAKENODEPSOCK_AUTOKILL ## R_PARALLELLY_MAKENODEPSOCK_PORT_INCREMENT ## R_PARALLELLY_MAKENODEPSOCK_TRIES_PORT #' #' @name parallelly.options NULL get_package_option <- function(name, default = NULL, package = .packageName) { if (!is.null(package)) { name <- paste(package, name, sep = ".") } getOption2(name, default = default) } # Set an R option from an environment variable update_package_option <- function(name, mode = "character", default = NULL, package = .packageName, split = NULL, trim = TRUE, disallow = c("NA"), force = FALSE, debug = FALSE) { if (!is.null(package)) { name <- paste(package, name, sep = ".") } mdebugf("Set package option %s", sQuote(name)) ## Already set? Nothing to do? value <- getOption2(name, NULL) if (!force && !is.null(value)) { mdebugf("Already set: %s", sQuote(value)) return(value) } ## name="Pkg.foo.Bar" => env="R_PKG_FOO_BAR" env <- gsub(".", "_", toupper(name), fixed = TRUE) env <- paste("R_", env, sep = "") env_value <- value <- getEnvVar2(env, default = NA_character_) if (is.na(value)) { if (debug) mdebugf("Environment variable %s not set", sQuote(env)) ## Nothing more to do? if (is.null(default)) return(getOption2(name)) if (debug) mdebugf("Use argument 'default': ", sQuote(default)) value <- default } if (debug) mdebugf("%s=%s", env, sQuote(value)) ## Trim? if (trim) value <- trim(value) ## Nothing to do? if (!nzchar(value)) return(getOption2(name, default = default)) ## Split? if (!is.null(split)) { value <- strsplit(value, split = split, fixed = TRUE) value <- unlist(value, use.names = FALSE) if (trim) value <- trim(value) } ## Coerce? mode0 <- storage.mode(value) if (mode0 != mode) { suppressWarnings({ storage.mode(value) <- mode }) if (debug) { mdebugf("Coercing from %s to %s: %s", mode0, mode, commaq(value)) } } if (length(disallow) > 0) { if ("NA" %in% disallow) { if (any(is.na(value))) { stopf("Coercing environment variable %s=%s to %s would result in missing values for option %s: %s", sQuote(env), sQuote(env_value), sQuote(mode), sQuote(name), commaq(value)) } } if (is.numeric(value)) { if ("non-positive" %in% disallow) { if (any(value <= 0, na.rm = TRUE)) { stopf("Environment variable %s=%s specifies a non-positive value for option %s: %s", sQuote(env), sQuote(env_value), sQuote(name), commaq(value)) } } if ("negative" %in% disallow) { if (any(value < 0, na.rm = TRUE)) { stopf("Environment variable %s=%s specifies a negative value for option %s: %s", sQuote(env), sQuote(env_value), sQuote(name), commaq(value)) } } } } if (debug) { mdebugf("=> options(%s = %s) [n=%d, mode=%s]", dQuote(name), commaq(value), length(value), storage.mode(value)) } do.call(options, args = structure(list(value), names = name)) getOption2(name, default = default) } ## Set package options based on environment variables update_package_options <- function(debug = FALSE) { update_package_option("availableCores.methods", mode = "character", split = ",", debug = debug) update_package_option("availableCores.fallback", mode = "integer", disallow = NULL, debug = debug) update_package_option("availableCores.system", mode = "integer", disallow = NULL, debug = debug) update_package_option("availableCores.logical", mode = "logical", debug = debug) update_package_option("availableCores.omit", mode = "integer", debug = debug) update_package_option("availableWorkers.methods", mode = "character", split = ",", debug = debug) update_package_option("fork.enable", mode = "logical", debug = debug) update_package_option("supportsMulticore.unstable", mode = "character", debug = debug) update_package_option("makeNodePSOCK.setup_strategy", mode = "character", debug = debug) update_package_option("makeNodePSOCK.validate", mode = "logical", debug = debug) update_package_option("makeNodePSOCK.connectTimeout", mode = "numeric", debug = debug) update_package_option("makeNodePSOCK.timeout", mode = "numeric", debug = debug) update_package_option("makeNodePSOCK.useXDR", mode = "logical", debug = debug) update_package_option("makeNodePSOCK.socketOptions", mode = "character", debug = debug) update_package_option("makeNodePSOCK.rshcmd", mode = "character", split = ",", debug = debug) update_package_option("makeNodePSOCK.rshopts", mode = "character", split = ",", debug = debug) update_package_option("makeNodePSOCK.tries", mode = "integer", debug = debug) update_package_option("makeNodePSOCK.tries.delay", mode = "numeric", debug = debug) update_package_option("makeNodePSOCK.rscript_label", mode = "character", debug = debug) update_package_option("makeNodePSOCK.sessionInfo.pkgs", mode = "character", split = ",", debug = debug) update_package_option("makeNodePSOCK.autoKill", mode = "logical", debug = debug) update_package_option("makeNodePSOCK.master.localhost.hostname", mode = "character", debug = debug) update_package_option("makeNodePSOCK.port.increment", mode = "logical", debug = debug) } parallelly/R/autoStopCluster.R0000644000175000017500000000236114064246641016262 0ustar nileshnilesh#' Automatically Stop a Cluster when Garbage Collected #' #' Registers a finalizer to a cluster such that the cluster will #' be stopped when garbage collected #' #' @param cl A cluster object created by for instance [makeClusterPSOCK()] #' or [parallel::makeCluster()]. #' #' @param debug If TRUE, then debug messages are produced when #' the cluster is garbage collected. #' #' @return The cluster object with attribute `gcMe` set. #' #' @example incl/autoStopCluster.R #' #' @seealso #' The cluster is stopped using #' \code{\link[parallel:makeCluster]{stopCluster}(cl)}. #' #' @importFrom parallel stopCluster #' @importFrom utils capture.output #' @export autoStopCluster <- function(cl, debug = FALSE) { stop_if_not(inherits(cl, "cluster")) ## Already got a finalizer? if (inherits(attr(cl, "gcMe"), "environment")) return(cl) env <- new.env() env$cluster <- cl attr(cl, "gcMe") <- env if (debug) { reg.finalizer(env, function(e) { message("Finalizing cluster ...") message(capture.output(print(e$cluster))) try(stopCluster(e$cluster), silent = FALSE) message("Finalizing cluster ... done") }) } else { reg.finalizer(env, function(e) { try(stopCluster(e$cluster), silent = TRUE) }) } cl } parallelly/R/detectCores.R0000644000175000017500000000173614025754625015356 0ustar nileshnileshdetectCores <- local({ cache <- list() function(logical = TRUE) { stop_if_not(is.logical(logical), length(logical) == 1L, !is.na(logical)) key <- paste("logical=", logical, sep = "") value <- cache[[key]] if (is.null(value)) { ## Get number of system cores from option, system environment, ## and finally detectCores(). This is also designed such that ## it is indeed possible to return NA_integer_. value <- getOption2("parallelly.availableCores.system", NULL) if (!is.null(value)) { value <- as.integer(value) return(value) } value <- parallel::detectCores(logical = logical) ## If unknown, set default to 1L if (is.na(value)) value <- 1L value <- as.integer(value) ## Assert positive integer stop_if_not(length(value) == 1L, is.numeric(value), is.finite(value), value >= 1L) cache[[key]] <<- value } value } }) parallelly/R/availableConnections.R0000644000175000017500000000565514060224156017227 0ustar nileshnilesh#' Number of Available and Free Connections #' #' The number of [connections] that can be open at the same time in \R is #' _typically_ 128, where the first three are occupied by the always open #' [stdin()], [stdout()], and [stderr()] connections, which leaves 125 slots #' available for other types of connections. Connections are used in many #' places, e.g. reading and writing to file, downloading URLs, communicating #' with parallel \R processes over a socket connections, and capturing standard #' output via text connections. #' #' @return #' A non-negative integer, or `+Inf` if the available number of connections #' is greated than 16384, which is a limit be set via option #' \option{parallelly.availableConnections.tries}. #' #' @section How to increase the limit: #' This limit of 128 connections can only be changed by rebuilding \R from #' source. The limited is hardcoded as a #' #' ```c #' #define NCONNECTIONS 128 #' ``` #' #' in \file{src/main/connections.c}. #' #' @section How the limit is identified: #' Since the limit _might_ changed, for instance in custom \R builds or in #' future releases of \R, we do not want to assume that the limit is 128 for #' all \R installation. Unfortunately, it is not possible to query \R for what #' the limit is. #' Instead, `availableConnections()` infers it from trial-and-error. #" Specifically, it attempts to open as many concurrent connections as possible #' until it fails. For efficiency, the result is memoized throughout the #' current \R session. #' #' @examples #' total <- availableConnections() #' message("You can have ", total, " connections open in this R installation") #' free <- freeConnections() #' message("There are ", free, " connections remaining") #' #' @seealso #' [base::showConnections()]. #' #' @references #' 1. 'WISH: Increase limit of maximum number of open connections (currently 125+3)', 2016-07-09, #' \url{https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28} #' @export availableConnections <- local({ max <- NULL function() { ## Overridden by R options? value <- getOption2("parallelly.availableConnections", NULL) if (!is.null(value)) { stop_if_not(length(value) == 1L, is.numeric(value), !is.na(value), value >= 3L) return(value) } if (is.null(max)) { tries <- getOption2("parallelly.availableConnections.tries", 16384L) stop_if_not(length(tries) == 1L, is.numeric(tries), !is.na(tries), tries >= 0L) cons <- list() on.exit({ lapply(cons, FUN = function(con) try(close(con), silent = TRUE)) }) max <<- tryCatch({ for (kk in seq_len(tries)) cons[[kk]] <- rawConnection(raw(0L)) +Inf }, error = function(ex) { length(getAllConnections()) }) } max } }) #' @rdname availableConnections #' @export freeConnections <- function() { availableConnections() - length(getAllConnections()) } parallelly/R/isConnectionValid.R0000644000175000017500000001625314025754625016525 0ustar nileshnilesh#' Checks if a Connection is Valid #' #' Get a unique identifier for an R \link[base:connections]{connection} #' and check whether or not the connection is still valid. #' #' @param con A \link[base:connections]{connection}. #' #' @return #' `isConnectionValid()` returns TRUE if the connection is still valid, #' otherwise FALSE. If FALSE, then character attribute `reason` provides #' an explanation why the connection is not valid. #' #' @return #' `connectionId()` returns an non-negative integer, -1, or `NA_integer_`. #' For connections stdin, stdout, and stderr, 0, 1, and 2, are returned, #' respectively. For all other connections, an integer greater or equal to #' 3 based on the connection's internal pointer is returned. #' A connection that has been serialized, which is no longer valid, has #' identifier -1. #' Attribute `raw_id` returns the pointer string from which the above is #' inferred. #' #' @section Connection Index versus Connection Identifier: #' R represents \link[base:connections]{connections} as indices using plain #' integers, e.g. `idx <- as.integer(con)`. #' The three connections standard input ("stdin"), standard output ("stdout"), #' and standard error ("stderr") always exists and have indices 0, 1, and 2. #' Any connection opened beyond these will get index three or greater, #' depending on availability as given by [base::showConnections()]. #' To get the connection with a given index, use [base::getConnection()]. #' **Unfortunately, this index representation of connections is non-robust**, #' e.g. there are cases where two or more 'connection' objects can end up with #' the same index and if used, the written output may end up at the wrong #' destination and files and database might get corrupted. This can for #' instance happen if [base::closeAllConnections()] is used (*). #' **In contrast, `id <- connectionId(con)` gives an identifier that is unique #' to that 'connection' object.** This identifier is based on the internal #' pointer address of the object. The risk for two connections in the same #' \R session to end up with the same pointer address is very small. #' Thus, in case we ended up in a situation where two connections `con1` and #' `con2` share the same index - `as.integer(con1) == as.integer(con2)` - #' they will never share the same identifier - #' `connectionId(con1) != connectionId(con2)`. #' Here, `isConnectionValid()` can be used to check which one of these #' connections, if any, are valid. #' #' (*) Note that there is no good reason for calling `closeAllConnections()` #' If called, there is a great risk that the files get corrupted etc. #' See (1) for examples and details on this problem. #' If you think there is a need to use it, it is much safer to restart \R #' because that is guaranteed to give you a working \R session with #' non-clashing connections. #' It might also be that `closeAllConnections()` is used because #' [base::sys.save.image()] is called, which might happen if \R is being #' forced to terminate. #' #' @section Connections Cannot be Serialized Or Saved: #' A 'connection' cannot be serialized, e.g. it cannot be saved to file to #' be read and used in another \R session. If attempted, the connection will #' not be valid. This is a problem that may occur in parallel processing #' when passing an \R object to parallel worker for further processing, e.g. #' the exported object may hold an internal database connection which will #' no longer be valid on the worker. #' When a connection is serialized, its internal pointer address will be #' invalidated (set to nil). In such cases, `connectionId(con)` returns -1 #' and `isConnectionValid(con)` returns FALSE. #' #' @examples #' ## R represents connections as plain indices #' as.integer(stdin()) ## int 0 #' as.integer(stdout()) ## int 1 #' as.integer(stderr()) ## int 2 #' #' ## The first three connections always exist and are always valid #' isConnectionValid(stdin()) ## TRUE #' connectionId(stdin()) ## 0L #' isConnectionValid(stdout()) ## TRUE #' connectionId(stdout()) ## 1L #' isConnectionValid(stderr()) ## TRUE #' connectionId(stderr()) ## 2L #' #' ## Connections cannot be serialized #' con <- file(tempfile(), open = "w") #' x <- list(value = 42, stderr = stderr(), con = con) #' y <- unserialize(serialize(x, connection = NULL)) #' isConnectionValid(y$stderr) ## TRUE #' connectionId(y$stderr) ## 2L #' isConnectionValid(y$con) ## FALSE with attribute 'reason' #' connectionId(y$con) ## -1L #' close(con) #' #' @references #' 1. ['BUG: A `connection` object may become corrupt and re-referenced to another connection (PATCH)'](https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81), 2018-10-30. #' 2. R-devel thread [PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()](https://stat.ethz.ch/pipermail/r-devel/2018-October/077004.html), 2018-10-31. #' #' @seealso #' See [base::showConnections()] for currently open connections and their #' indices. To get a connection by its index, use [base::getConnection()]. #' #' @importFrom utils capture.output #' @export isConnectionValid <- function(con) { stop_if_not(inherits(con, "connection")) index <- as.integer(con) ## stdin, stdout, or stderr? if (index <= 2) return(TRUE) ## No such connection index? if (!is.element(index, getAllConnections())) { res <- FALSE attr(res, "reason") <- sprintf("Connection (%s) is no longer valid. There is currently no registered R connection with that index %d", connectionInfo(con), index) return(res) } ## That connection is no longer registered? current_con <- getConnection(index) res <- identical(attr(con, "conn_id"), attr(current_con, "conn_id")) if (!isTRUE(res)) { attr(res, "reason") <- sprintf("Connection (%s) is no longer valid. It differ from the currently registered R connection with the same index %d (%s)", connectionInfo(con), index, connectionInfo(current_con)) return(res) } ## A valid connection TRUE } #' @rdname isConnectionValid #' @export connectionId <- function(con) { stop_if_not(inherits(con, "connection")) ## stdin, stdout, or stderr? index <- as.integer(con) if (index <= 2) return(index) id <- attr(con, "conn_id") if (is.null(id)) return(NA_integer_) id <- raw_id <- capture.output(print(id)) id <- gsub("()", "", id) ## Has the connection been serialized? ## 0: observed on Solaris if (id == "(nil)" || id == "0x0" || id == "0") return(-1L) id <- strtoi(id, base = 16L) attr(id, "raw_id") <- raw_id id } connectionInfo <- function(con) { index <- as.integer(con) if (is.element(index, getAllConnections())) { details <- summary(con) } else { details <- as.list(rep(NA_character_, times = 7L)) names(details) <- c("description", "class", "mode", "text", "opened", "can read", "can write") } id <- connectionId(con) details$id <- id details$raw_id <- attr(id, "raw_id") info <- unlist(lapply(details, FUN = function(x) { if (is.character(x)) paste0('"', x, '"') else x }), use.names = FALSE) info <- sprintf("%s=%s", names(details), info) info <- paste(info, collapse = ", ") info <- sprintf("connection: index=%d, %s", index, info) info } parallelly/R/zzz.R0000644000175000017500000000505014147254531013734 0ustar nileshnilesh## covr: skip=all .onLoad <- function(libname, pkgname) { debug <- isTRUE(as.logical(getEnvVar2("R_PARALLELLY_DEBUG", "FALSE"))) if (debug) options(parallelly.debug = TRUE) debug <- getOption2("parallelly.debug", debug) ## Automatically play nice when 'R CMD check' runs? if (isTRUE(as.logical(getEnvVar2("R_PARALLELLY_R_CMD_CHECK_NICE", "TRUE"))) && inRCmdCheck()) { if (debug) mdebug("Detected 'R CMD check':\n - adjusting defaults to be a good citizen") ## To be nicer to test environments (e.g. CRAN, Travis CI and AppVeyor CI), ## timeout much earlier than the default 30 days. This will also give a more ## informative error message produced by R itself, rather than whatever the ## test environment produces. ## NOTE: By using environment variables, instead of R options, we can make ## sure these settings are also passed down to child processes, including ## nested ones. Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_CONNECTTIMEOUT = 2 * 60) Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_TIMEOUT = 2 * 60) ## Collect more session details from workers to helps troubleshooting on ## remote servers, e.g. CRAN servers Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_SESSIONINFO_PKGS = TRUE) ## Automatically kill stray cluster workers, if possible Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_AUTOKILL = TRUE) ## Label cluster workers, if possible Sys.setenv(R_PARALLELLY_MAKENODEPSOCK_RSCRIPT_LABEL = TRUE) ## Lower the risk for port clashes by using a large pool of random ports. ## The default 11000:11999 tend to fail occassionally on CRAN but also ## locally. Sys.setenv(R_PARALLELLY_RANDOM_PORTS = "20000:39999") ## Not all CRAN servers have _R_CHECK_LIMIT_CORES_ set [REF?]. Setting it ## to 'TRUE' when unset, will better emulate CRAN submission checks. if (!nzchar(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))) { ## Possible values: 'TRUE' 'false', 'warn', 'error' Sys.setenv("_R_CHECK_LIMIT_CORES_" = "TRUE") } } ## Set package options based on environment variables update_package_options(debug = debug) ## If neeeded, work around bug in R preventing us from using the 'parallel' ## setup strategy of PSOCK cluster nodes parallelly_disable_parallel_setup_if_needed() if (debug) { envs <- Sys.getenv() pattern <- sprintf("^R_(FUTURE|%s)_", toupper(.packageName)) envs <- envs[grep(pattern, names(envs))] envs <- sprintf("- %s=%s", names(envs), sQuote(envs)) mdebug(paste(c("parallelly-specific environment variables:", envs), collapse = "\n")) } } parallelly/R/freeCores.R0000644000175000017500000000345014043061205015003 0ustar nileshnilesh#' Get the Average Number of Free CPU Cores #' #' @param memory (character) The time period used to infer the system load, #' with alternatives being 5 minutes (default), 15 minutes, or 1 minute. #' #' @param fraction (non-negative numeric) A scale factor. #' #' @param logical Passed as-is to [availableCores()]. #' #' @param default (integer) The value to be returned if the system load is #' unknown, i.e. [cpuLoad()] return missing values. #' #' @return An positive integer with attributes `loadavg` (named numeric), #' `maxCores` (named integer), argument `memory` (character), and #' argument `fraction` (numeric). #' #' @example incl/freeCores.R #' #' @keywords internal #' @export freeCores <- function(memory = c("5min", "15min", "1min"), fraction = 0.90, logical = getOption2("parallelly.availableCores.logical", TRUE), default = parallelly::availableCores()) { memory <- match.arg(memory, choices = c("5min", "15min", "1min")) stop_if_not(!is.na(fraction), fraction > 0, fraction <= 1) ## Number of cores on the current system ncores <- availableCores(methods = c("system", "fallback"), logical = logical) if (ncores == 1L) return(1L) loadavg <- cpuLoad() ## Failed to infer the CPU load? if (is.na(loadavg[memory])) { oopts <- options(parallelly.availableCores.custom = NULL) on.exit(options(oopts)) default <- as.integer(default) stop_if_not(length(default) == 1L, is.integer(default), !is.na(default), default >= 1L) free <- default } else { free <- max(1L, as.integer(floor(fraction * (ncores - loadavg[memory])))) } stop_if_not(length(free) == 1L, is.integer(free), !is.na(free), free >= 1L) attr(free, "loadavg") <- loadavg attr(free, "maxCores") <- ncores attr(free, "memory") <- memory attr(free, "fraction") <- fraction free }parallelly/R/availableWorkers.R0000644000175000017500000005145314146362545016410 0ustar nileshnilesh#' Get Set of Available Workers #' #' @param methods A character vector specifying how to infer the number #' of available cores. #' #' @param na.rm If TRUE, only non-missing settings are considered/returned. #' #' @param logical Passed as-is to [availableCores()]. #' #' @param default The default set of workers. #' #' @param which A character specifying which set / sets to return. #' If `"auto"` (default), the first non-empty set found. #' If `"min"`, the minimum value is returned. #' If `"max"`, the maximum value is returned (be careful!) #' If `"all"`, all values are returned. #' #' @return Return a character vector of workers, which typically consists #' of names of machines / compute nodes, but may also be IP numbers. #' #' @details #' The default set of workers for each method is #' `rep("localhost", times = availableCores(methods = method, logical = logical))`, #' which means that each will at least use as many parallel workers on the #' current machine that [availableCores()] allows for that method. #' #' In addition, the following settings ("methods") are also acknowledged: #' \itemize{ #' \item `"PBS"` - #' Query TORQUE/PBS environment variable \env{PBS_NODEFILE}. #' If this is set and specifies an existing file, then the set #' of workers is read from that file, where one worker (node) #' is given per line. #' An example of a job submission that results in this is #' `qsub -l nodes = 4:ppn = 2`, which requests four nodes each #' with two cores. #' #' \item `"SGE"` - #' Query Sun/Oracle Grid Engine (SGE) environment variable #' \env{PE_HOSTFILE}. #' An example of a job submission that results in this is #' `qsub -pe mpi 8` (or `qsub -pe ompi 8`), which #' requests eight cores on a any number of machines. #' #' \item `"LSF"` - #' Query LSF/OpenLava environment variable \env{LSB_HOSTS}. #' #' \item `"Slurm"` - #' Query Slurm environment variable \env{SLURM_JOB_NODELIST} (fallback #' to legacy \env{SLURM_NODELIST}) and parse set of nodes. #' Then query Slurm environment variable \env{SLURM_JOB_CPUS_PER_NODE} #' (fallback \env{SLURM_TASKS_PER_NODE}) to infer how many CPU cores #' Slurm have alloted to each of the nodes. If \env{SLURM_CPUS_PER_TASK} #' is set, which is always a scalar, then that is respected too, i.e. #' if it is smaller, then that is used for all nodes. #' For example, if `SLURM_NODELIST="n1,n[03-05]"` (expands to #' `c("n1", "n03", "n04", "n05")`) and `SLURM_JOB_CPUS_PER_NODE="2(x2),3,2"` #' (expands to `c(2, 2, 3, 2, 2)`), then #' `c("n1", "n1", "n03", "n03", "n04", "n04", "n04", "n05", "n05")` is #' returned. If in addition, `SLURM_CPUS_PER_TASK=1`, which can happen #' depending on hyperthreading configurations on the Slurm cluster, then #' `c("n1", "n03", "n04", "n05")` is returned. #' #' \item `"custom"` - #' If option \option{parallelly.availableWorkers.custom} is set and a function, #' then this function will be called (without arguments) and it's value #' will be coerced to a character vector, which will be interpreted as #' hostnames of available workers. #' } #' #' @section Known limitations: #' `availableWorkers(methods = "Slurm")` will expand \env{SLURM_JOB_NODELIST} #' using \command{scontrol show hostnames "$SLURM_JOB_NODELIST"}, if available. #' If not available, then it attempts to parse the compressed nodelist based #' on a best-guess understanding on what the possible syntax may be. #' One known limitation is that "multi-dimensional" ranges are not supported, #' e.g. `"a[1-2]b[3-4]"` is expanded by \command{scontrol} to #' `c("a1b3", "a1b4", "a2b3", "a2b4")`. If \command{scontrol} is not #' available, then any components that failed to be parsed are dropped with #' an informative warning message. If no compents could be parsed, then #' the result of `methods = "Slurm"` will be empty. #' #' @examples #' message(paste("Available workers:", #' paste(sQuote(availableWorkers()), collapse = ", "))) #' #' \dontrun{ #' options(mc.cores = 2L) #' message(paste("Available workers:", #' paste(sQuote(availableWorkers()), collapse = ", "))) #' } #' #' \dontrun{ #' ## Always use two workers on host 'n1' and one on host 'n2' #' options(parallelly.availableWorkers.custom = function() { #' c("n1", "n1", "n2") #' }) #' message(paste("Available workers:", #' paste(sQuote(availableWorkers()), collapse = ", "))) #' } #' #' @seealso #' To get the number of available workers on the current machine, #' see [availableCores()]. #' #' @importFrom utils file_test #' @export availableWorkers <- function(methods = getOption2("parallelly.availableWorkers.methods", c("mc.cores", "BiocParallel", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "custom", "system", "fallback")), na.rm = TRUE, logical = getOption2("parallelly.availableCores.logical", TRUE), default = getOption2("parallelly.localhost.hostname", "localhost"), which = c("auto", "min", "max", "all")) { ## Local functions getenv <- function(name) { as.character(trim(getEnvVar2(name, default = NA_character_))) } getopt <- function(name) { as.character(getOption2(name, default = NA_character_)) } split <- function(s) { x <- unlist(strsplit(s, split = "[, ]", fixed = FALSE), use.names = FALSE) x <- trim(x) x <- x[nzchar(x)] x } which <- match.arg(which, choices = c("auto", "min", "max", "all")) stop_if_not(is.character(default), length(default) >= 1, !anyNA(default)) ## Default is to use the current machine ncores <- availableCores(methods = methods, na.rm = FALSE, logical = logical, which = "all") localhost_hostname <- getOption2("parallelly.localhost.hostname", "localhost") workers <- lapply(ncores, FUN = function(n) { if (length(n) == 0 || is.na(n)) n <- 0L rep(localhost_hostname, times = n) }) ## Acknowledge known HPC settings (skip others) methods_localhost <- c("BiocParallel", "_R_CHECK_LIMIT_CORES_", "mc.cores", "mc.cores+1", "system") methodsT <- setdiff(methods, methods_localhost) for (method in methodsT) { if (method == "PBS") { pathname <- getenv("PBS_NODEFILE") if (is.na(pathname)) next if (!file_test("-f", pathname)) { warnf("Environment variable %s was set but no such file %s exists", sQuote("PBS_NODEFILE"), sQuote(pathname)) next } data <- read_pbs_nodefile(pathname) w <- data$node ## Sanity checks pbs_np <- as.integer(getenv("PBS_NP")) if (!identical(pbs_np, length(w))) { warnf("Identified %d workers from the %s file (%s), which does not match environment variable %s = %d", length(w), sQuote("PBS_NODEFILE"), sQuote(pathname), sQuote("PBS_NP"), pbs_np) } pbs_nodes <- as.integer(getenv("PBS_NUM_NODES")) pbs_ppn <- as.integer(getenv("PBS_NUM_PPN")) pbs_np <- pbs_nodes * pbs_ppn if (!identical(pbs_np, length(w))) { warnf("Identified %d workers from the %s file (%s), which does not match environment variables %s * %s = %d * %d = %d", length(w), sQuote("PBS_NODEFILE"), sQuote(pathname), sQuote("PBS_NUM_NODES"), sQuote("PBS_NUM_PPN"), pbs_nodes, pbs_ppn, pbs_np) } ## TO DO: Add validation of 'w' (from PBS_HOSTFILE) toward ## counts in PBS_NP and / or PBS_NUM_NODES * PBS_NUM_PPN. } else if (method == "SGE") { pathname <- getenv("PE_HOSTFILE") if (is.na(pathname)) next if (!file_test("-f", pathname)) { warnf("Environment variable %s was set but no such file %s exists", sQuote("PE_HOSTFILE"), sQuote(pathname)) next } w <- read_pe_hostfile(pathname, expand = TRUE) ## Sanity checks nslots <- as.integer(getenv("NSLOTS")) if (!identical(nslots, length(w))) { warnf("Identified %d workers from the %s file (%s), which does not match environment variable %s = %d", length(w), sQuote("PE_HOSTFILE"), sQuote(pathname), sQuote("NSLOTS"), nslots) } } else if (method == "Slurm") { ## From 'man sbatch': ## SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility) ## List of nodes allocated to the job. ## Example: ## SLURM_JOB_NODELIST=n1,n[3-8],n[23-25] nodelist <- getenv("SLURM_JOB_NODELIST") if (is.na(nodelist)) data <- getenv("SLURM_NODELIST") if (is.na(nodelist)) next ## Parse and expand nodelist w <- slurm_expand_nodelist(nodelist) ## Failed to parse? if (length(w) == 0) next ## SLURM_JOB_CPUS_PER_NODE=64,12,... nodecounts <- getenv("SLURM_JOB_CPUS_PER_NODE") if (is.na(nodecounts)) nodecounts <- getenv("SLURM_TASKS_PER_NODE") if (is.na(nodecounts)) { warning("Expected either environment variable 'SLURM_JOB_CPUS_PER_NODE' or 'SLURM_TASKS_PER_NODE' to be set. Will assume one core per node.") } else { ## Parse counts c <- slurm_expand_nodecounts(nodecounts) if (any(is.na(c))) { warnf("Failed to parse 'SLURM_JOB_CPUS_PER_NODE' or 'SLURM_TASKS_PER_NODE': %s", sQuote(nodecounts)) next } if (length(c) != length(w)) { warnf("Skipping Slurm settings because the number of elements in 'SLURM_JOB_CPUS_PER_NODE'/'SLURM_TASKS_PER_NODE' (%s) does not match parsed 'SLURM_JOB_NODELIST'/'SLURM_NODELIST' (%s): %d != %d", nodelist, nodecounts, length(c), length(w)) next } ## Always respect 'SLURM_CPUS_PER_TASK' (always a scalar), if that exists n <- getenv("SLURM_CPUS_PER_TASK") if (!is.na(n)) { c0 <- c c <- rep(n, times = length(w)) ## Is our assumption that SLURM_CPUS_PER_TASK <= SLURM_JOB_NODELIST, correct? if (any(c < n)) { c <- pmin(c, n) warnf("Unexpected values of Slurm environment variable. 'SLURM_CPUS_PER_TASK' specifies CPU counts on one or more nodes that is strictly less than what 'SLURM_CPUS_PER_TASK' specifies. Will use the minimum of the two for each node: %s < %s", sQuote(nodecounts), n) } } ## Expand workers list w <- as.list(w) for (kk in seq_along(w)) { w[[kk]] <- rep(w[[kk]], times = c[kk]) } w <- unlist(w, use.names = FALSE) } } else if (method == "LSF") { data <- getenv("LSB_HOSTS") if (is.na(data)) next w <- split(data) } else if (method == "custom") { fcn <- getOption2("parallelly.availableWorkers.custom", NULL) if (!is.function(fcn)) next w <- local({ ## Avoid calling the custom function recursively oopts <- options(parallelly.availableWorkers.custom = NULL) on.exit(options(oopts)) fcn() }) w <- as.character(w) } else { ## Fall back to querying option and system environment variable ## with the given name w <- getopt(method) if (is.na(w)) w <- getenv(method) if (is.na(w)) next w <- split(w) } ## Drop missing values? if (na.rm) w <- w[!is.na(w)] workers[[method]] <- w } nnodes <- unlist(lapply(workers, FUN = length), use.names = TRUE) if (which == "auto") { ## For default localhost sets, use the minimum allowed number of ## workers **according to availableCores()**. methodsT <- intersect(names(workers), methods_localhost) methodsT <- methodsT[is.finite(ncores[methodsT]) & ncores[methodsT] > 0] if (length(methodsT) > 1L) { min <- min(ncores[methodsT], na.rm = TRUE) if (is.finite(min)) { nnodes[methodsT] <- min workers[methodsT] <- list(rep(localhost_hostname, times = min)) } } workers <- apply_fallback(workers) nnodes <- unlist(lapply(workers, FUN = length), use.names = TRUE) ## Now, pick the first positive and finite value idx <- which(nnodes > 0L, useNames = FALSE)[1] workers <- if (is.na(idx)) character(0L) else workers[[idx]] } else if (which == "min") { workers <- apply_fallback(workers) nnodes <- unlist(lapply(workers, FUN = length), use.names = TRUE) idx <- which.min(nnodes) workers <- workers[[idx]] } else if (which == "max") { workers <- apply_fallback(workers) nnodes <- unlist(lapply(workers, FUN = length), use.names = TRUE) idx <- which.max(nnodes) workers <- workers[[idx]] } ## Fall back to default? if (is.character(workers) && length(workers) == 0) workers <- default ## Sanity checks min_count <- as.integer(na.rm) if (is.list(workers)) { lapply(workers, FUN = function(w) { stop_if_not(is.character(w), length(w) >= 0L, all(nchar(w) > 0)) }) } else { stop_if_not(is.character(workers), length(workers) >= min_count, all(nchar(workers) > 0)) } workers } # availableWorkers() #' @importFrom utils read.table read_pbs_nodefile <- function(pathname, sort = TRUE) { ## One (node) per line lines <- readLines(pathname, warn = FALSE) lines <- trim(lines) ## Sanity checks stop_if_not( all(nzchar(lines)), !anyNA(lines), !any(grepl("[[:space:]]", lines)) ) data <- data.frame(node = lines, stringsAsFactors = FALSE) if (sort) { data <- data[order(data$node), , drop = FALSE] } data } #' @importFrom utils read.table read_pe_hostfile <- function(pathname, sort = TRUE, expand = FALSE) { ## One (node, ncores, queue, comment) per line, e.g. ## opt88 3 short.q@opt88 UNDEFINED ## iq242 2 short.q@iq242 UNDEFINED ## opt116 1 short.q@opt116 UNDEFINED data <- read.table(pathname, header = FALSE, sep = " ", stringsAsFactors = FALSE) ## Sanity checks stop_if_not(ncol(data) >= 2) colnames(data)[1:2] <- c("node", "count") if (ncol(data) >= 3) colnames(data)[3] <- "via" if (ncol(data) >= 4) colnames(data)[4] <- "notes" stop_if_not( is.character(data$node), !anyNA(data$node), !any(grepl("[[:space:]]", data$node)), is.integer(data$count), !anyNA(data$count), all(is.finite(data$count)), all(data$count > 0) ) if (sort) { data <- data[order(data$node, data$count), , drop = FALSE] } if (expand) { data <- sge_expand_node_count_pairs(data) } data } ## Used after read_pe_hostfile() sge_expand_node_count_pairs <- function(data) { nodes <- mapply(data$node, data$count, FUN = function(node, count) { rep(node, times = count) }, SIMPLIFY = FALSE, USE.NAMES = FALSE) unlist(nodes, recursive = FALSE, use.names = FALSE) } #' @importFrom utils file_test call_slurm_show_hostname <- function(nodelist, bin = Sys.which("scontrol")) { stop_if_not(file_test("-x", bin)) args <- c("show", "hostname", shQuote(nodelist)) res <- system2(bin, args = args, stdout = TRUE) status <- attr(res, "status") if (!is.null(status)) { call <- sprintf("%s %s", shQuote(bin), paste(args, collapse = " ")) msg <- sprintf("%s failed with exit code %s", call, status) stop(msg) } res } supports_scontrol_show_hostname <- local({ res <- NA function() { if (!is.na(res)) return(res) ## Look for 'scontrol' bin <- Sys.which("scontrol") if (!nzchar(bin)) { res <<- FALSE return(res) } ## Try a conversion truth <- c("a1", "b02", "b03", "b04", "b6", "b7") nodelist <- "a1,b[02-04,6-7]" hosts <- tryCatch({ call_slurm_show_hostname(nodelist, bin = bin) }, error = identity) if (inherits(hosts, "error")) { res <<- FALSE return(res) } ## Sanity check if (!isTRUE(all.equal(sort(hosts), sort(truth)))) { warnf("Internal availableWorkers() validation failed: 'scontrol show hostnames %s' did not return the expected results. Expected c(%s) but got c(%s). Will still use it this methods but please report this to the maintainer of the 'parallelly' package", shQuote(nodelist), commaq(truth), commaq(hosts), immediate. = TRUE) } value <- TRUE attr(value, "scontrol") <- bin res <<- value res } }) ## SLURM_JOB_NODELIST="a1,b[02-04,6-7]" slurm_expand_nodelist <- function(nodelist, manual = getOption2("parallelly.slurm_expand_nodelist.manual", FALSE)) { ## Alt 1. Is 'scontrol show hostnames' supported? if (!manual && supports_scontrol_show_hostname()) { hosts <- call_slurm_show_hostname(nodelist) return(hosts) } ## Alt 2. Manually parse the nodelist specification data <- nodelist ## Replace whitespace *within* square brackets with zeros ## Source: scontrol show hostnames treats "n[1, 3-4]" == "n[1,003-4]" pattern <- "\\[([[:digit:],-]*)[[:space:]]([[:digit:][:space:],-]*)" while (grepl(pattern, data)) { data <- gsub(pattern, "[\\10\\2", data) } ## Replace any commas *within* square brackets with semicolons pattern <- "\\[([[:digit:][:space:];-]*),([[:digit:][:space:];-]*)" while (grepl(pattern, data)) { data <- gsub(pattern, "[\\1;\\2", data) } data <- strsplit(data, split = "[,[:space:]]", fixed = FALSE) data <- as.list(unlist(data, use.names = FALSE)) ## Keep only non-empty entries, which may happen due to whitespace or ## extra commas. This should not happen but 'scontrol show hostnames' ## handles those cases too. data <- data[nzchar(data)] for (ii in seq_along(data)) { spec <- data[[ii]] ## Already expanded? if (length(spec) > 1L) next ## 1. Expand square-bracket specifications ## e.g. "a1,b[02-04,6-7]" => c("a1", "b02", "b03", "b04", "b6", "b7") pattern <- "^(.*)\\[([[:digit:];-]+)\\]$" if (grepl(pattern, spec)) { prefix <- gsub(pattern, "\\1", spec) set <- gsub(pattern, "\\2", spec) sets <- strsplit(set, split = ";", fixed = TRUE) sets <- unlist(sets, use.names = FALSE) sets <- as.list(sets) for (jj in seq_along(sets)) { set <- sets[[jj]] ## Expand by evaluating them as R expressions idxs <- tryCatch({ expr <- parse(text = gsub("-", ":", set, fixed = TRUE)) eval(expr, envir = baseenv()) }, error = function(e) NA_integer_) idxs <- as.character(idxs) ## Pad with zeros? pattern <- "^([0]*)[[:digit:]]+.*" if (grepl(pattern, set)) { pad <- gsub(pattern, "\\1", set) idxs <- paste(pad, idxs, sep = "") } set <- paste(prefix, idxs, sep = "") sets[[jj]] <- set } ## for (jj ...) sets <- unlist(sets, use.names = FALSE) data[[ii]] <- sets } } ## for (ii in ...) hosts <- unlist(data, recursive = FALSE, use.names = FALSE) ## Sanity check if (any(!nzchar(hosts))) { warnf("Unexpected result from parallelly:::slurm_expand_nodelist(..., manual = TRUE), which resulted in %d empty hostname based on nodelist specification %s", sum(!nzchar(hosts)), sQuote(nodelist)) hosts <- hosts[nzchar(hosts)] } ## Failed to expand all compressed ranges? This may happen because ## "multi-dimensional" ranges are given, e.g. "a[1-2]b[3-4]". This is ## currently not supported by the above manual parser. /HB 2021-03-05 invalid <- grep("(\\[|\\]|,|;|[[:space:]])", hosts, value = TRUE) if (length(invalid) > 0) { warnf("Failed to parse the compressed Slurm nodelist %s. Detected invalid node names, which are dropped: %s", sQuote(nodelist), commaq(invalid)) hosts <- setdiff(hosts, invalid) } hosts } SLURM_TASKS_PER_NODE="2(x2),1(x3)" # Source: 'man sbatch' slurm_expand_nodecounts <- function(nodecounts) { counts <- strsplit(nodecounts, split = ",", fixed = TRUE) counts <- unlist(counts, use.names = TRUE) counts <- counts[nzchar(counts)] counts <- as.list(counts) counts <- lapply(counts, FUN = function(count) { ## Drop whitespace count <- gsub("[[:space:]]", "", count) pattern <- "^([[:digit:]]+)[(]x([[:digit:]]+)[)]$" if (grepl(pattern, count)) { times <- gsub(pattern, "\\2", count) times <- as.integer(times) if (is.na(times)) return(NA_integer_) count <- gsub(pattern, "\\1", count) count <- as.integer(count) if (is.na(count)) return(NA_integer_) count <- rep(count, times = times) } else { count <- as.integer(count) } }) counts <- unlist(counts, use.names = TRUE) if (any(is.na(counts))) { warnf("Failed to parse Slurm node counts specification: %s", nodecounts) } counts } ## Used by availableWorkers() apply_fallback <- function(workers) { ## No 'fallback'? idx_fallback <- which(names(workers) == "fallback") if (length(idx_fallback) == 0) return(workers) ## Number of workers per set nnodes <- unlist(lapply(workers, FUN = length), use.names = TRUE) ## No 'fallback' workers? if (nnodes[idx_fallback] == 0) return(workers) ## Only consider non-empty sets nonempty <- which(nnodes > 0) workers_nonempty <- workers[nonempty] ## Nothing to do? n_nonempty <- length(workers_nonempty) if (n_nonempty <= 1) return(workers) ## Drop 'fallback'? if (n_nonempty > 2) { workers <- workers[-idx_fallback] return(workers) } ## No 'system' to override? idx_system <- which(names(workers) == "system") if (length(idx_system) == 0) return(workers) ## Drop 'system' in favor or 'fallback' workers <- workers[-idx_system] workers } ## apply_fallback() parallelly/R/cpuLoad.R0000644000175000017500000000210214025754625014465 0ustar nileshnilesh#' Get the Recent CPU Load #' #' @return A named numeric vector with three elements `1min`, `5min`, and #' `15min` with non-negative values. #' These values represent estimates of the CPU load during the last minute, #' the last five minutes, and the last fifteen minutes \[1\]. #' An idle system have values close to zero, and a heavily loaded system #' have values near `parallel::detectCores()`. #' If they are unknown, missing values are returned. #' #' @details #' This function works only Unix-like system with \file{/proc/loadavg}. #' #' @example incl/cpuLoad.R #' #' @references #' 1. Linux Load Averages: Solving the Mystery, #' Brendan Gregg's Blog, 2017-08-08, #' \url{http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html} #' #' @keywords internal #' @export cpuLoad <- function() { if (file.exists("/proc/loadavg")) { res <- readLines("/proc/loadavg", n = 1L) res <- strsplit(res, split=" ", fixed = TRUE)[[1]] res <- as.numeric(res[1:3]) } else { res <- rep(NA_real_, times = 3L) } names(res) <- c("1min", "5min", "15min") res } parallelly/R/availableCores.R0000644000175000017500000003740614147241140016015 0ustar nileshnilesh#' Get Number of Available Cores on The Current Machine #' #' The current/main \R session counts as one, meaning the minimum #' number of cores available is always at least one. #' #' @param constraints An optional character specifying under what #' constraints ("purposes") we are requesting the values. #' For instance, on systems where multicore processing is not supported #' (i.e. Windows), using `constrains = "multicore"` will force a #' single core to be reported. #' #' @param methods A character vector specifying how to infer the number #' of available cores. #' #' @param na.rm If TRUE, only non-missing settings are considered/returned. #' #' @param logical Passed to #' \code{\link[parallel]{detectCores}(logical = logical)}, which, if supported, #' returns the number of logical CPUs (TRUE) or physical CPUs/cores (FALSE). #' This argument is only if argument `methods` includes `"system"`. #' #' @param default The default number of cores to return if no non-missing #' settings are available. #' #' @param which A character specifying which settings to return. #' If `"min"` (default), the minimum value is returned. #' If `"max"`, the maximum value is returned (be careful!) #' If `"all"`, all values are returned. #' #' @param omit (integer; non-negative) Number of cores to not include. #' #' @return Return a positive (>= 1) integer. #' If `which = "all"`, then more than one value may be returned. #' Together with `na.rm = FALSE` missing values may also be returned. #' #' @details #' The following settings ("methods") for inferring the number of cores #' are supported: #' \itemize{ #' \item `"system"` - #' Query \code{\link[parallel]{detectCores}(logical = logical)}. #' \item `"nproc"` - #' On Unix, query system command \code{nproc}. #' \item `"mc.cores"` - #' If available, returns the value of option #' \code{\link[base:options]{mc.cores}}. #' Note that \option{mc.cores} is defined as the number of #' _additional_ \R processes that can be used in addition to the #' main \R process. This means that with `mc.cores = 0` all #' calculations should be done in the main \R process, i.e. we have #' exactly one core available for our calculations. #' The \option{mc.cores} option defaults to environment variable #' \env{MC_CORES} (and is set accordingly when the \pkg{parallel} #' package is loaded). The \option{mc.cores} option is used by for #' instance \code{\link[=mclapply]{mclapply}()} of the \pkg{parallel} #' package. #' \item `"BiocParallel"` - #' Query environment variables \env{BIOCPARALLEL_WORKER_NUMBER} (integer), #' which is defined by **BiocParallel** (>= 1.27.2), and \env{BBS_HOME} #' (logical). If the former is set, this is the number of cores considered. #' If the latter is set, then a maximum of 4 cores is considered. #' \item `"PBS"` - #' Query TORQUE/PBS environment variables \env{PBS_NUM_PPN} and \env{NCPUS}. #' Depending on PBS system configuration, these _resource_ #' parameters may or may not default to one. #' An example of a job submission that results in this is #' `qsub -l nodes=1:ppn=2`, which requests one node with two cores. #' \item `"SGE"` - #' Query Sun/Oracle Grid Engine (SGE) environment variable #' \env{NSLOTS}. #' An example of a job submission that results in this is #' `qsub -pe smp 2` (or `qsub -pe by_node 2`), which #' requests two cores on a single machine. #' \item `"Slurm"` - #' Query Simple Linux Utility for Resource Management (Slurm) #' environment variable \env{SLURM_CPUS_PER_TASK}. #' This may or may not be set. It can be set when submitting a job, #' e.g. `sbatch --cpus-per-task=2 hello.sh` or by adding #' `#SBATCH --cpus-per-task=2` to the \file{hello.sh} script. #' If \env{SLURM_CPUS_PER_TASK} is not set, then it will fall back to #' use \env{SLURM_CPUS_ON_NODE} if the job is a single-node job #' (\env{SLURM_JOB_NUM_NODES} is 1), e.g. `sbatch --ntasks=2 hello.sh`. #' \item `"LSF"` - #' Query Platform Load Sharing Facility (LSF) environment variable #' \env{LSB_DJOB_NUMPROC}. #' Jobs with multiple (CPU) slots can be submitted on LSF using #' `bsub -n 2 -R "span[hosts=1]" < hello.sh`. #' \item `"custom"` - #' If option \option{parallelly.availableCores.custom} is set and a function, #' then this function will be called (without arguments) and it's value #' will be coerced to an integer, which will be interpreted as a number #' of available cores. If the value is NA, then it will be ignored. #' } #' For any other value of a `methods` element, the \R option with the #' same name is queried. If that is not set, the system environment #' variable is queried. If neither is set, a missing value is returned. #' #' @section Avoid ending up with zero cores: #' Note that some machines might have a limited number of cores, or the R #' process runs in a container or a cgroup that only provides a small number #' of cores. In such cases: #' #' ```r #' ncores <- availableCores() - 1 #' ``` #' #' may return zero, which is often not intended and is likely to give an #' error downstream. Instead, use: #' #' ```r #' ncores <- availableCores(omit = 1) #' ``` #' #' to put aside one of the cores from being used. Regardless how many cores #' you put aside, this function is guaranteed to return at least one core. #' #' @section Advanced usage: #' It is possible to override the maximum number of cores on the machine #' as reported by `availableCores(methods = "system")`. This can be #' done by first specifying #' `options(parallelly.availableCores.methods = "mc.cores")` and #' then the number of cores to use, e.g. `options(mc.cores = 8)`. #' #' @examples #' message(paste("Number of cores available:", availableCores())) #' #' \dontrun{ #' options(mc.cores = 2L) #' message(paste("Number of cores available:", availableCores())) #' } #' #' \dontrun{ #' ## IMPORTANT: availableCores() may return 1L #' options(mc.cores = 1L) #' ncores <- availableCores() - 1 ## ncores = 0 #' ncores <- availableCores(omit = 1) ## ncores = 1 #' message(paste("Number of cores to use:", ncores)) #' } #' #' \dontrun{ #' ## Use 75% of the cores on the system but never more than four #' options(parallelly.availableCores.custom = function() { #' ncores <- max(parallel::detectCores(), 1L, na.rm = TRUE) #' ncores <- min(as.integer(0.75 * ncores), 4L) #' max(1L, ncores) #' }) #' message(paste("Number of cores available:", availableCores())) #' #' ## What is available minus one core but at least one #' options(parallelly.availableCores.custom = function() { #' max(1L, parallelly::availableCores() - 1L) #' }) #' message(paste("Number of cores available:", availableCores())) #' } #' #' @seealso #' To get the set of available workers regardless of machine, #' see [availableWorkers()]. #' #' @importFrom parallel detectCores #' @export availableCores <- function(constraints = NULL, methods = getOption2("parallelly.availableCores.methods", c("system", "nproc", "mc.cores", "BiocParallel", "_R_CHECK_LIMIT_CORES_", "PBS", "SGE", "Slurm", "LSF", "fallback", "custom")), na.rm = TRUE, logical = getOption2("parallelly.availableCores.logical", TRUE), default = c(current = 1L), which = c("min", "max", "all"), omit = getOption2("parallelly.availableCores.omit", 0L)) { ## Local functions getenv <- function(name, mode = "integer") { value <- trim(getEnvVar2(name, default = NA_character_)) storage.mode(value) <- mode value } # getenv() getopt <- function(name, mode = "integer") { value <- getOption2(name, default = NA_integer_) storage.mode(value) <- mode value } # getopt() which <- match.arg(which, choices = c("min", "max", "all")) stop_if_not(length(default) == 1, is.finite(default), default >= 1L) stop_if_not(length(omit) == 1L, is.numeric(omit), is.finite(omit), omit >= 0L) omit <- as.integer(omit) ncores <- rep(NA_integer_, times = length(methods)) names(ncores) <- methods for (kk in seq_along(methods)) { method <- methods[kk] if (method == "Slurm") { ## Number of cores assigned by Slurm ## The assumption is that the following works regardless of ## number of nodes requested /HB 2020-09-18 ## Example: --cpus-per-task={n} n <- getenv("SLURM_CPUS_PER_TASK") if (is.na(n)) { ## Example: --nodes={nnodes} (defaults to 1, short: -N {nnodes}) ## From 'man sbatch': ## SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility) ## Total number of nodes in the job's resource allocation. nnodes <- getenv("SLURM_JOB_NUM_NODES") if (is.na(nnodes)) nnodes <- getenv("SLURM_NNODES") if (is.na(nnodes)) nnodes <- 1L ## Can this happen? /HB 2020-09-18 if (nnodes == 1L) { ## Example: --nodes=1 --ntasks={n} (short: -n {n}) ## IMPORTANT: 'SLURM_CPUS_ON_NODE' appears to be rounded up when nodes > 1. ## Example 1: With --nodes=2 --cpus-per-task=3 we see SLURM_CPUS_ON_NODE=4 ## although SLURM_CPUS_PER_TASK=3. ## Example 2: With --nodes=2 --ntasks=7, we see SLURM_CPUS_ON_NODE=6, ## SLURM_JOB_CPUS_PER_NODE=6,2, no SLURM_CPUS_PER_TASK, and ## SLURM_TASKS_PER_NODE=5,2. ## Conclusions: We can only use 'SLURM_CPUS_ON_NODE' for nnodes = 1. n <- getenv("SLURM_CPUS_ON_NODE") } else { ## Parse `SLURM_TASKS_PER_NODE` nodecounts <- getenv("SLURM_TASKS_PER_NODE", mode = "character") if (!is.na(nodecounts)) { ## Examples: ## SLURM_TASKS_PER_NODE=5,2 ## SLURM_TASKS_PER_NODE=2(x2),1(x3) # Source: 'man sbatch' n <- slurm_expand_nodecounts(nodecounts) if (any(is.na(n))) next ## ASSUMPTION: We assume that it is the first component on the list that ## corresponds to the current machine. /HB 2021-03-05 n <- n[1] } } } ## TODO?: Can we validate above assumptions/results? /HB 2020-09-18 if (FALSE && !is.na(n)) { ## Is any of the following useful? ## Example: --ntasks={ntasks} (no default, short: -n {ntasks}) ## From 'man sbatch': ## SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility) ## Same as -n, --ntasks ntasks <- getenv("SLURM_NTASKS") if (is.na(ntasks)) ntasks <- getenv("SLURM_NPROCS") } } else if (method == "PBS") { ## Number of cores assigned by TORQUE/PBS n <- getenv("PBS_NUM_PPN") if (is.na(n)) { ## PBSPro sets 'NCPUS' but not 'PBS_NUM_PPN' n <- getenv("NCPUS") } } else if (method == "SGE") { ## Number of cores assigned by Sun/Oracle Grid Engine (SGE) n <- getenv("NSLOTS") } else if (method == "LSF") { ## Number of slots assigned by LSF n <- getenv("LSB_DJOB_NUMPROC") } else if (method == "mc.cores") { ## Number of cores by option defined by 'parallel' package n <- getopt("mc.cores") if (!is.na(n) && n == 0) n <- 1L ## Because options(mc.cores = 0) may be set } else if (method == "mc.cores+1") { ## Number of cores by option defined by 'parallel' package n <- getopt("mc.cores") + 1L } else if (method == "BiocParallel") { n <- getenv("BIOCPARALLEL_WORKER_NUMBER") if (nzchar(Sys.getenv("BBS_HOME"))) n <- min(n, 4L, na.rm = TRUE) } else if (method == "_R_CHECK_LIMIT_CORES_") { ## A flag set by R CMD check for constraining number of ## cores allowed to be use in package tests. Here we ## acknowledge this and sets number of cores to the ## maximum two allowed. This way we don't have to explicitly ## use options(mc.cores = 2L) in example code, which may be ## misleading to the reader. chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) chk <- (nzchar(chk) && (chk != "false")) n <- if (chk) 2L else NA_integer_ } else if (method == "system") { ## Number of cores available according to parallel::detectCores() n <- detectCores(logical = logical) } else if (method == "nproc") { ## Number of cores according Unix 'nproc' n <- getNproc() } else if (method == "fallback") { ## Number of cores available according to parallelly.availableCores.fallback n <- getOption2("parallelly.availableCores.fallback", NA_integer_) n <- as.integer(n) } else if (method == "custom") { fcn <- getOption2("parallelly.availableCores.custom", NULL) if (!is.function(fcn)) next n <- local({ ## Avoid calling the custom function recursively oopts <- options(parallelly.availableCores.custom = NULL) on.exit(options(oopts)) fcn() }) n <- as.integer(n) if (length(n) != 1L) { stop("Function specified by option 'parallelly.availableCores.custom' does not a single value") } } else { ## covr: skip=3 ## Fall back to querying option and system environment variable ## with the given name n <- getopt(method) if (is.na(n)) n <- getenv(method) } ncores[kk] <- n } ## Validate settings ncoresT <- ncores[!is.na(ncores)] ncoresT <- ncoresT[ncoresT <= 0] if (length(ncoresT) > 0) { msg <- sprintf("Detected invalid (zero or less) core settings: %s", paste(paste0(sQuote(names(ncoresT)), " = ", ncoresT), collapse = ", ")) mdebug(msg) stop(msg) } ## Drop missing values? if (na.rm) { ncores <- ncores[!is.na(ncores)] } ## Fall back to the default? if (length(ncores) == 0) ncores <- default ## Keep only one if (length(ncores) >= 2 && (which %in% c("min", "max"))) { ## SPECIAL: The 'fallback' should only be used as a fallback if no other ## options are explicitly set / available. idx_fallback <- which(names(ncores) == "fallback") if (length(idx_fallback) == 1) { ## Use only if 'system' and 'nproc' are the only other options ignore <- c("system", "nproc") if (length(setdiff(names(ncores), c("fallback", ignore))) == 0) { ncores <- ncores[idx_fallback] } else { ## ... otherwise, ignore 'fallback'. ncores <- ncores[-idx_fallback] } } if (which == "min") { ## which.min() to preserve name ncores <- ncores[which.min(ncores)] } else if (which == "max") { ## which.max() to preserve name ncores <- ncores[which.max(ncores)] } } if (!is.null(constraints)) { if (constraints == "multicore") { ## SPECIAL: On some OSes such as Windows, multicore processing ## is not supported. If so, we should override all values to ## to reflect that only a single core is available if (!supportsMulticore()) ncores[] <- 1L } } ## Omit some of the cores? if (omit > 0L) { ncores <- ncores - omit ncores[ncores < 1L] <- 1L } ## Sanity check stop_if_not(all(ncores >= 1L, na.rm = TRUE)) ncores } # availableCores() getNproc <- function(ignore = c("OMP_NUM_THREADS", "OMP_THREAD_LIMIT")) { ## 'nproc' is limited by 'OMP_NUM_THREADS' and 'OMP_THREAD_LIMIT', if set. ## However, that is not what we want for availableCores(). Because of ## this, we unset those while querying 'nproc'. if (length(ignore) > 0) { ignore <- intersect(ignore, names(Sys.getenv())) if (length(ignore) > 0) { oignore <- Sys.getenv(ignore, names = TRUE) oignore <- as.list(oignore) on.exit(do.call(Sys.setenv, args = oignore), add = TRUE) Sys.unsetenv(ignore) } } systems <- list(linux = "nproc 2>/dev/null") os <- names(systems) m <- pmatch(os, table = R.version$os, nomatch = NA_integer_) m <- os[!is.na(m)] if (length(m) == 0L) return(NA_integer_) for (cmd in systems[[m]]) { tryCatch({ res <- suppressWarnings(system(cmd, intern=TRUE)) res <- gsub("(^[[:space:]]+|[[:space:]]+$)", "", res[1]) if (grepl("^[1-9]$", res)) return(as.integer(res)) }, error = identity) } NA_integer_ } parallelly/R/isForkedNode.R0000644000175000017500000000120614025754625015456 0ustar nileshnilesh#' Checks whether or not a Cluster Node Runs in a Forked Process #' #' @param node A cluster node of class `SOCKnode` or `SOCK0node`. #' #' @param \ldots Not used. #' #' @return (logical) Returns TRUE if the cluster node is running in a #' forked child process and FALSE if it does not. #' If it cannot be inferred, NA is returned. #' #' @export isForkedNode <- function(node, ...) UseMethod("isForkedNode") #' @export isForkedNode.default <- function(node, ...) NA #' @export isForkedNode.forknode <- function(node, ...) { TRUE } #' @export isForkedNode.cluster <- function(node, ...) { vapply(node, FUN = isForkedNode, FUN.VALUE = NA) } parallelly/R/makeClusterMPI.R0000644000175000017500000000746314146362545015742 0ustar nileshnilesh#' Create a Message Passing Interface (MPI) Cluster of R Workers for Parallel Processing #' #' The `makeClusterMPI()` function creates an MPI cluster of \R workers #' for parallel processing. This function utilizes #' `makeCluster(..., type = "MPI")` of the \pkg{parallel} package and #' tweaks the cluster in an attempt to avoid #' \code{\link[parallel:makeCluster]{stopCluster()}} from hanging (1). #' _WARNING: This function is very much in a beta version and should #' only be used if `parallel::makeCluster(..., type = "MPI")` fails._ #' #' _Creating MPI clusters requires that the \pkg{Rmpi} and \pkg{snow} #' packages are installed._ #' #' @inheritParams makeClusterPSOCK #' #' @param workers The number workers (as a positive integer). #' #' @param \dots Optional arguments passed to #' \code{\link[parallel:makeCluster]{makeCluster}(workers, type = "MPI", ...)}. #' #' @return An object of class `c("RichMPIcluster", "MPIcluster", "cluster")` consisting #' of a list of `"MPInode"` workers. #' #' @examples #' \donttest{\dontrun{ #' if (requireNamespace("Rmpi") && requireNamespace("snow")) { #' cl <- makeClusterMPI(2, autoStop = TRUE) #' print(cl) #' y <- parLapply(cl, X = 1:3, fun = sqrt) #' print(y) #' rm(list = "cl") #' } #' }} #' #' @references #' 1. R-sig-hpc thread \href{https://stat.ethz.ch/pipermail/r-sig-hpc/2017-September/002065.html}{Rmpi: mpi.close.Rslaves() 'hangs'} on 2017-09-28. #' #' @seealso #' [makeClusterPSOCK()] and [parallel::makeCluster()]. #' #' @importFrom parallel makeCluster #' @export makeClusterMPI <- function(workers, ..., autoStop = FALSE, verbose = getOption2("parallelly.debug", FALSE)) { if (is.numeric(workers)) { if (length(workers) != 1L) { stopf("When numeric, argument 'workers' must be a single value: %s", length(workers)) } workers <- as.integer(workers) if (is.na(workers) || workers < 1L) { stopf("Number of 'workers' must be one or greater: %s", workers) } } else { stopf("Argument 'workers' must be an integer: %s", mode(workers)) } if (verbose) { message(sprintf("Number of workers: %d", workers)) } ## FIXME: Re-implement locally using below for loop cl <- makeCluster(workers, type = "MPI", ...) n <- length(cl) for (ii in seq_along(cl)) { if (verbose) message(sprintf("Updating node %d of %d ...", ii, n)) ## Attaching session information for each worker. This is done to assert ## that we have a working cluster already here. It will also collect ## useful information otherwise not available, e.g. the PID. if (verbose) message("- collecting session information") cl[ii] <- add_cluster_session_info(cl[ii]) if (verbose) message(sprintf("Updating node %d of %d ... DONE", ii, n)) } ## AD HOC/WORKAROUND: ## Note, stopCluster.spawnedMPIcluster() calls Rmpi::mpi.comm.disconnect() ## which may stall R. Because of this, we drop 'spawnedMPIcluster' from ## the class attribute to avoid calling that method. Similarly, calling ## Rmpi::mpi.finalize() and Rmpi::mpi.exit() may also hang R. ## See also below stopCluster.RichMPIcluster() implementation. ## REFERENCE: https://stackoverflow.com/a/44317647/1072091 class(cl) <- c("RichMPIcluster", setdiff(class(cl), "spawnedMPIcluster")) if (autoStop) cl <- autoStopCluster(cl) cl } ## makeClusterMPI() #' @export #' @keywords internal stopCluster.RichMPIcluster <- function(cl) { NextMethod() if (!requireNamespace(pkg <- "Rmpi", quietly = TRUE)) return(invisible(cl)) ## https://stat.ethz.ch/pipermail/r-sig-hpc/2017-September/002065.html ns <- getNamespace("Rmpi") if (!exists("mpi.comm.free", mode = "function", envir = ns, inherits = FALSE)) return(invisible(cl)) mpi.comm.free <- get("mpi.comm.free", mode = "function", envir = ns, inherits = FALSE) comm <- 1 mpi.comm.free(comm) invisible(cl) } parallelly/R/utils,pid.R0000644000175000017500000002063714116406533015015 0ustar nileshnilesh#' Check whether a process PID exists or not #' #' @param pid A positive integer. #' #' @return Returns `TRUE` if a process with the given PID exists, #' `FALSE` if a process with the given PID does not exists, and #' `NA` if it is not possible to check PIDs on the current system. #' #' @details #' There is no single go-to function in \R for testing whether a PID exists #' or not. Instead, this function tries to identify a working one among #' multiple possible alternatives. A method is considered working if the #' PID of the current process is successfully identified as being existing #' such that `pid_exists(Sys.getpid())` is `TRUE`. If no working #' approach is found, `pid_exists()` will always return `NA` #' regardless of PID tested. #' On Unix, including macOS, alternatives `tools::pskill(pid, signal = 0L)` #' and `system2("ps", args = pid)` are used. #' On Windows, various alternatives of `system2("tasklist", ...)` are used. #' #' @references #' 1. The Open Group Base Specifications Issue 7, 2018 edition, #' IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) #' \url{https://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html} #' #' 2. Microsoft, tasklist, 2018-08-30, #' \url{https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/tasklist} #' #' 3. R-devel thread 'Detecting whether a process exists or not by its PID?', #' 2018-08-30. #' \url{https://stat.ethz.ch/pipermail/r-devel/2018-August/076702.html} #' #' @seealso #' \code{\link[tools]{pskill}()} and \code{\link[base]{system2}()}. #' #' @importFrom tools pskill #' @importFrom utils str #' @keywords internal pid_exists <- local({ os <- .Platform$OS.type ## The value of tools::pskill() is incorrect in R (< 3.5.0). ## This was fixed in R (>= 3.5.0). ## https://github.com/HenrikBengtsson/Wishlist-for-R/issues/62 if (getRversion() >= "3.5.0") { pid_exists_by_pskill <- function(pid, debug = FALSE) { tryCatch({ ## "If sig is 0 (the null signal), error checking is performed but no ## signal is actually sent. The null signal can be used to check the ## validity of pid." [1] res <- pskill(pid, signal = 0L) if (debug) { cat(sprintf("Call: tools::pskill(%s, signal = 0L)\n", pid)) print(res) } as.logical(res) }, error = function(ex) NA) } } else { pid_exists_by_pskill <- function(pid, debug = FALSE) NA } pid_exists_by_ps <- function(pid, debug = FALSE) { tryCatch({ ## 'ps is likely to be supported by more 'ps' clients than ## 'ps -p ' and 'ps --pid ' out <- suppressWarnings({ system2("ps", args = pid, stdout = TRUE, stderr = FALSE) }) if (debug) { cat(sprintf("Call: ps %s\n", pid)) print(out) str(out) } status <- attr(out, "status") if (is.numeric(status) && status < 0) return(NA) out <- gsub("(^[ ]+|[ ]+$)", "", out) out <- out[nzchar(out)] if (debug) { cat("Trimmed:\n") print(out) str(out) } out <- strsplit(out, split = "[ ]+", fixed = FALSE) out <- lapply(out, FUN = function(x) x[1]) out <- unlist(out, use.names = FALSE) if (debug) { cat("Extracted: ", paste(sQuote(out), collapse = ", "), "\n", sep = "") } out <- suppressWarnings(as.integer(out)) if (debug) { cat("Parsed: ", paste(sQuote(out), collapse = ", "), "\n", sep = "") } any(out == pid) }, error = function(ex) NA) } pid_exists_by_tasklist_filter <- function(pid, debug = FALSE) { ## Example: tasklist /FI "PID eq 12345" /NH [2] ## Try multiple times, because 'tasklist' seems to be unreliable, e.g. ## I've observed on win-builder that two consecutive calls filtering ## on Sys.getpid() once found a match while the second time none. for (kk in 1:5) { res <- tryCatch({ args = c("/FI", shQuote(sprintf("PID eq %g", pid)), "/NH") out <- system2("tasklist", args = args, stdout = TRUE) if (debug) { cat(sprintf("Call: tasklist %s\n", paste(args, collapse = " "))) print(out) str(out) } out <- gsub("(^[ ]+|[ ]+$)", "", out) out <- out[nzchar(out)] if (debug) { cat("Trimmed:\n") print(out) str(out) } out <- grepl(sprintf(" %g ", pid), out) if (debug) { cat("Contains PID: ", paste(out, collapse = ", "), "\n", sep = "") } any(out) }, error = function(ex) NA) if (isTRUE(res)) return(res) Sys.sleep(0.1) } res } pid_exists_by_tasklist <- function(pid, debug = FALSE) { ## Example: tasklist [2] for (kk in 1:5) { res <- tryCatch({ out <- system2("tasklist", stdout = TRUE) if (debug) { cat("Call: tasklist\n") print(out) str(out) } out <- gsub("(^[ ]+|[ ]+$)", "", out) out <- out[nzchar(out)] skip <- grep("^====", out)[1] if (!is.na(skip)) out <- out[seq(from = skip + 1L, to = length(out))] if (debug) { cat("Trimmed:\n") print(out) str(out) } out <- strsplit(out, split = "[ ]+", fixed = FALSE) ## WORKAROUND: The 'Image Name' column may contain spaces, making ## it hard to locate the second column. Instead, we will identify ## the most common number of column (typically six) and the count ## how many columns we should drop at the end in order to find the ## second as the last ## n <- lengths(out) n <- sort(n)[round(length(n) / 2)] ## "median" without using 'stats' drop <- n - 2L out <- lapply(out, FUN = function(x) rev(x)[-seq_len(drop)][1]) out <- unlist(out, use.names = FALSE) if (debug) { cat("Extracted: ", paste(sQuote(out), collapse = ", "), "\n", sep = "") } out <- as.integer(out) if (debug) { cat("Parsed: ", paste(sQuote(out), collapse = ", "), "\n", sep = "") } out <- (out == pid) if (debug) { cat("Equals PID: ", paste(out, collapse = ", "), "\n", sep = "") } any(out) }, error = function(ex) NA) if (isTRUE(res)) return(res) Sys.sleep(0.1) } res } cache <- list() function(pid, debug = getOption2("parallelly.debug", FALSE)) { stop_if_not(is.numeric(pid), length(pid) == 1L, is.finite(pid), pid > 0L) pid_check <- cache$pid_check ## Does a working pid_check() exist? if (!is.null(pid_check)) return(pid_check(pid, debug = debug)) if (debug) mdebug("Attempting to find a working pid_exists_*() function ...") ## Try to find a working pid_check() function, i.e. one where ## pid_check(Sys.getpid()) == TRUE if (os == "unix") { ## Unix, Linux, and macOS if (isTRUE(pid_exists_by_pskill(Sys.getpid(), debug = debug))) { pid_check <- pid_exists_by_pskill } else if (isTRUE(pid_exists_by_ps(Sys.getpid(), debug = debug))) { pid_check <- pid_exists_by_ps } } else if (os == "windows") { ## Microsoft Windows if (isTRUE(pid_exists_by_tasklist(Sys.getpid(), debug = debug))) { pid_check <- pid_exists_by_tasklist } else if (isTRUE(pid_exists_by_tasklist_filter(Sys.getpid(), debug = debug))) { pid_check <- pid_exists_by_tasklist_filter } } if (is.null(pid_check)) { if (debug) mdebug("- failed; pid_check() will always return NA") ## Default to NA pid_check <- function(pid) NA } else { ## Sanity check stop_if_not(isTRUE(pid_check(Sys.getpid(), debug = debug))) if (debug) mdebug("- success") } ## Record cache$pid_check <- pid_check if (debug) mdebug("Attempting to find a working pid_exists_*() function ... done") pid_check(pid) } }) #' @importFrom tools pskill pid_kill <- function(pid, wait = 0.5, timeout = 30, debug = TRUE) { pid <- as.integer(pid) stop_if_not(length(pid), !is.na(pid), pid >= 0L) setTimeLimit(elapsed = timeout) on.exit(setTimeLimit(elapsed = Inf)) tryCatch({ ## Always try to kill, because pid_exists() can be very slow on Windows pskill(pid) ## Wait a bit before checking whether process was successfully ## killed or not Sys.sleep(wait) ## WARNING: pid_exists() can be very slow on Windows !isTRUE(pid_exists(pid)) }, error = function(ex) NA) } parallelly/R/parallelly_disable_parallel_setup_if_needed.R0000644000175000017500000001071614116376243024007 0ustar nileshnilesh## Bug #18119 (https://bugs.r-project.org/show_bug.cgi?id=18119) ## has been fixed in R-devel r80472 (2021-06-10) and in R-4.1-branch in ## r80532 (2021-06-19). It does not apply to R (< 4.0.0). r_version_has_bug18119 <- local({ res <- NA get_r_info <- function() { ## R version version <- Sys.getenv("R_PARALLELLY_R_VERSION", NA_character_) if (is.na(version)) { version <- getRversion() } else { version <- numeric_version(version) } ## SVN revision revision <- Sys.getenv("R_PARALLELLY_R_REVISION", NA_character_) if (is.na(revision)) { revision <- R.version[["svn rev"]] if (length(revision) != 1) revision <- -1L } revision <- as.integer(revision) if (!is.finite(revision)) revision <- -1L list(version = version, revision = revision) } function(force = FALSE) { if (force) res <<- NA if (!is.na(res)) return(res) r <- get_r_info() ## Too old version of R? if (r$version < "4.0.0") { res <<- FALSE return(FALSE) } ## All R 4.0.* versions have the bug if (r$version < "4.1.0") { res <<- TRUE return(TRUE) } if (r$version == "4.1.0") { if (r$revision >= 80532) { ## Bug has been fixed in R 4.1.0 patched r80532 res <<- FALSE return(FALSE) } } else if (r$version == "4.2.0") { if (r$revision >= 80472) { ## Bug has been fixed in R 4.2.0 devel r80472 res <<- FALSE return(FALSE) } } else if (r$version >= "4.1.1") { ## Bug has been fixed in R 4.1.1 (to be released Aug 2021) res <<- FALSE return(FALSE) } ## In all other cases, we'll assume the running R version has the bug res <<- TRUE TRUE } }) ## Check if the current R session is affected by bug 18119 or not. ## Return NA, if we're not 100% sure affected_by_bug18119 <- local({ res <- NA function(force = FALSE) { if (force) res <<- NA if (!is.na(res)) return(res) ## Nothing to do: Has R bug 18119 been fixed? if (!r_version_has_bug18119(force = force)) { res <<- FALSE return(FALSE) } ## Running RStudio Console? if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM"))) { res <<- TRUE return(TRUE) } ## Is 'tcltk' loaded? if ("tcltk" %in% loadedNamespaces()) { res <<- TRUE ## Remember this, in case 'tcltk' is unloaded return(TRUE) } ## Otherwise, we don't know NA } }) ## The RStudio Console does not support setup_strategy = "parallel" ## https://github.com/rstudio/rstudio/issues/6692#issuecomment-785346223 ## Unless our R option is already set explicitly (or via the env var), ## be agile to how RStudio handles it for the 'parallel' package ## This bug (https://bugs.r-project.org/show_bug.cgi?id=18119) ## has been fixed in R-devel r80472 (2021-06-10) and in R-4.1-branch in ## r80532 (2021-06-19). ## ## UPDATE 2021-07-15: It turns out that this bug also affects macOS if ## the 'tcltk' package is loaded, cf. ## https://github.com/rstudio/rstudio/issues/6692#issuecomment-880647623 parallelly_disable_parallel_setup_if_needed <- function(liberal = TRUE) { ## Nothing to do: Has R bug 18119 been fixed? if (!r_version_has_bug18119()) return(FALSE) ## Always respect users settings if (!is.null(getOption("parallelly.makeNodePSOCK.setup_strategy"))) { return(FALSE) } if (liberal) { ## Assume it'll work, unless we know it won't if (is.na(affected_by_bug18119())) return(FALSE) } ## Force 'parallelly' to use the "sequential" setup strategy options(parallelly.makeNodePSOCK.setup_strategy = "sequential") ## Force 'parallel' to use the "sequential" setup strategy parallel_set_setup_strategy("sequential") TRUE } parallel_set_setup_strategy <- function(value) { ns <- getNamespace("parallel") if (!exists("defaultClusterOptions", mode = "environment", envir = ns)) { return() } defaultClusterOptions <- get("defaultClusterOptions", mode = "environment", envir = ns) ## Nothing to do? current <- defaultClusterOptions$setup_strategy if (identical(current, value)) return() ## Cannot set? if (!exists("setDefaultClusterOptions", mode = "function", envir = ns)) { return() } setDefaultClusterOptions <- get("setDefaultClusterOptions", mode = "function", envir = ns) setDefaultClusterOptions(setup_strategy = value) } parallelly/R/ports.R0000644000175000017500000001175614146362545014264 0ustar nileshnilesh#' Find a TCP port that can be opened #' #' @param ports (integer vector, or character string) #' Zero or more TCP ports in \[0, 65535\] to scan. #' If `"random"`, then a random set of ports is considered. #' If `"auto"`, then the port given by environment variable #' \env{R_PARALLEL_PORT} is used, which may also specify `random`. #' #' @param default (integer) `NA_integer_` or a port to returned if #' an available port could not be found. #' If `"first"`, then `ports[1]`. If `"random"`, then a random port #' among `ports` is used. If `length(ports) == 0`, then `NA_integer_`. #' #' @param randomize (logical) If TRUE, `ports` is randomly shuffled #' before searched. This shuffle does _not_ forward the RNG seed. #' #' @return #' Returns an integer representing the first port among `ports` that #' can be opened. If none can be opened, then `default` is returned. #' If port querying is not supported, as in R (< 4.0.0), then `default` #' is returned. #' #' @export freePort <- function(ports = 1024:65535, default = "first", randomize = TRUE) { if (is.character(default)) { default <- match.arg(default, choices = c("first", "random")) } else { default <- as.integer(default) stop_if_not(length(default) == 1L) if (!is.na(default)) default <- assertPort(default) } if (is.character(ports)) { how <- match.arg(ports, choices = c("auto", "random")) if (identical(how, "auto")) { ports <- Sys.getenv("R_PARALLEL_PORT", "random") if (identical(ports, "random")) { how <- "random" } else { ports <- suppressWarnings(as.integer(ports)) if (is.na(ports)) { warnf("Will use a random port because environment variable 'R_PARALLEL_PORT' coerced to NA_integer_: %s", sQuote(Sys.getenv("R_PARALLEL_PORT"))) how <- "random" } } } if (identical(how, "random")) { ports <- randomParallelPorts() randomize <- TRUE } } ## Update 'default'? ## Note, this will become NA_integer_ if length(ports) == 0 if (is.character(default)) { default <- switch(default, first = ports[1], random = stealth_sample(ports, size = 1L) ) } stop_if_not(is.logical(randomize), !is.na(randomize)) if (randomize) ports <- stealth_sample(ports) ## Nothing todo? if (length(ports) == 0L) return(default) ## Find first available port for (kk in seq_along(ports)) { port <- ports[kk] free <- canPortBeUsed(port) ## SPECIAL CASE: If it's not possible to query ports, ## then use the first one if (is.na(free)) return(default) ## Available? if (free) return(port) } default } #' Check whether a TCP port can be opened or not #' #' @param port (integer) A TCP port in \[0, 65535\]. #' #' @return #' `canPortBeUsed(port)` returns a logical indicating whether the port can #' be opened or not, or cannot be queried. If the port can be opened, #' then `TRUE` is returned, if cannot be opened then `FALSE` is returned, #' which may happen if the port is used by another process. #' If port querying is not supported, as in R (< 4.0.0), then `NA` is #' returned. #' #' @keywords internal canPortBeUsed <- function(port) { port <- assertPort(port) ## If not possible to query, return NA ## It works in R (>= 4.0.0) ns <- asNamespace("parallel") if (!exists("serverSocket", envir = ns, mode = "function")) return(NA) serverSocket <- get("serverSocket", envir = ns, mode = "function") con <- tryCatch(serverSocket(port), error = identity) ## Success? free <- inherits(con, "connection") if (free) close(con) free } assertPort <- function(port) { stop_if_not(is.numeric(port), length(port) == 1L) port <- as.integer(port) if (is.na(port) || port < 0L || port > 65535L) { stopf("Invalid port: %s", port) } port } randomParallelPorts <- function(default = 11000:11999) { random <- getEnvVar2("R_PARALLELLY_RANDOM_PORTS", "") if (!nzchar(random)) return(default) pattern <- "^([[:digit:]]+)(|:([[:digit:]]+))$" if (!grepl(pattern, random)) { warnf("Value of environment variable 'R_PARALLELLY_RANDOM_PORTS' does not match regular expression %s: %s", sQuote(pattern), sQuote(random)) return(default) } from <- sub(pattern, "\\1", random) from <- as.integer(from) if (is.na(from)) { warnf("Value of environment variable 'R_PARALLELLY_RANDOM_PORTS' coerced to NA_integer_: %s", sQuote(random)) return(default) } if (from < 0L || from > 65535L) { warnf("Value of environment variable 'R_PARALLELLY_RANDOM_PORTS' does not specify ports in [0,65535]: %s", sQuote(random)) return(default) } to <- sub(pattern, "\\3", random) if (!nzchar(to)) return(from) to <- as.integer(to) if (is.na(to)) { warnf("Value of environment variable 'R_PARALLELLY_RANDOM_PORTS' coerced to NA_integer_: %s", sQuote(random)) return(default) } if (to < 0L || to > 65535L) { warnf("Value of environment variable 'R_PARALLELLY_RANDOM_PORTS' does not specify ports in [0,65535]: %s", sQuote(random)) return(default) } from:to } parallelly/R/stealth_sample.R0000644000175000017500000000202114075254325016100 0ustar nileshnilesh## A version of base::sample() that does not change .Random.seed stealth_sample <- function(x, size = length(x), replace = FALSE, ...) { ## Nothing to do? if (size == 0L) return(x[integer(0)]) ## Nothing to randomize? if (length(x) == 1L) { return(rep(x, times = size)) } oseed <- .GlobalEnv$.Random.seed on.exit({ if (is.null(oseed)) { rm(list = ".Random.seed", envir = .GlobalEnv, inherits = FALSE) } else { .GlobalEnv$.Random.seed <- oseed } }) ## Generate a psuedo-random random seed based on the current ## random state and the current time time_offset <- format(Sys.time(), format = "%H%M%OS6") time_offset <- sub(".", "", time_offset, fixed = TRUE) time_offset <- strsplit(time_offset, split = "", fixed = TRUE)[[1]] time_offset <- sample(time_offset) time_offset <- paste(time_offset, collapse = "") time_offset <- as.numeric(time_offset) time_offset <- time_offset %% .Machine$integer.max set.seed(time_offset) sample(x, size = size, replace = replace, ...) } parallelly/R/isForkedChild.R0000644000175000017500000000140014146362545015610 0ustar nileshnilesh#' Checks whether or not we are running in a forked child process #' #' @return (logical) Returns TRUE if the running in a forked child #' process, otherwise FALSE. #' #' @details #' Examples of setups and functions that rely on _forked_ parallelization #' are `parallel::makeCluster(n, type = "FORK")`, `parallel::mclapply()`, #' and `future::plan("multicore")`. #' #' @export isForkedChild <- local({ isChild <- NULL function() { if (is.null(isChild)) { if (supportsMulticore()) { ## Asked for parallel:::isChild() to be exported /HB 2021-11-04 ## https://bugs.r-project.org/show_bug.cgi?id=18230 isChild <- importParallel("isChild") } else { isChild <- function() FALSE } } isChild() } }) parallelly/R/makeClusterPSOCK.R0000644000175000017500000022041414156454104016157 0ustar nileshnilesh#' Create a PSOCK Cluster of R Workers for Parallel Processing #' #' The `makeClusterPSOCK()` function creates a cluster of \R workers #' for parallel processing. These \R workers may be background \R sessions #' on the current machine, \R sessions on external machines (local or remote), #' or a mix of such. For external workers, the default is to use SSH to connect #' to those external machines. This function works similarly to #' \code{\link[parallel:makeCluster]{makePSOCKcluster}()} of the #' \pkg{parallel} package, but provides additional and more flexibility options #' for controlling the setup of the system calls that launch the background #' \R workers, and how to connect to external machines. #' #' @param workers The hostnames of workers (as a character vector) or the number #' of localhost workers (as a positive integer). #' #' @param makeNode A function that creates a `"SOCKnode"` or #' `"SOCK0node"` object, which represents a connection to a worker. #' #' @param port The port number of the master used for communicating with all #' the workers (via socket connections). If an integer vector of ports, then a #' random one among those is chosen. If `"random"`, then a random port in #' is chosen from `11000:11999`, or from the range specified by #' environment variable \env{R_PARALLELLY_RANDOM_PORTS}. #' If `"auto"` (default), then the default (single) port is taken from #' environment variable \env{R_PARALLEL_PORT}, otherwise `"random"` is #' used. #' _Note, do not use this argument to specify the port number used by #' `rshcmd`, which typically is an SSH client. Instead, if the SSH daemon #' runs on a different port than the default 22, specify the SSH port by #' appending it to the hostname, e.g. `"remote.server.org:2200"` or via #' SSH options `-p`, e.g. `rshopts = c("-p", "2200")`._ #' #' @param \dots Optional arguments passed to #' `makeNode(workers[i], ..., rank = i)` where `i = seq_along(workers)`. #' #' @param autoStop If TRUE, the cluster will be automatically stopped #' using \code{\link[parallel:makeCluster]{stopCluster}()} when it is #' garbage collected, unless already stopped. See also [autoStopCluster()]. #' #' @param tries,delay Maximum number of attempts done to launch each node #' with `makeNode()` and the delay (in seconds) in-between attempts. #' If argument `port` specifies more than one port, e.g. `port = "random"` #' then a random port will be drawn and validated at most `tries` times. #' Arguments `tries` and `delay` are used only when `setup_strategy == "sequential`. #' #' @param validate If TRUE (default), after the nodes have been created, they are all #' validated that they work by inquiring about their session information, #' which is saved in attribute `session_info` of each node. #' #' @param verbose If TRUE, informative messages are outputted. #' #' @return An object of class `c("RichSOCKcluster", "SOCKcluster", "cluster")` #' consisting of a list of `"SOCKnode"` or `"SOCK0node"` workers (that also #' inherit from `RichSOCKnode`). #' #' @example incl/makeClusterPSOCK.R #' #' @importFrom parallel stopCluster #' @importFrom utils packageVersion #' @export makeClusterPSOCK <- function(workers, makeNode = makeNodePSOCK, port = c("auto", "random"), ..., autoStop = FALSE, tries = getOption2("parallelly.makeNodePSOCK.tries", 3L), delay = getOption2("parallelly.makeNodePSOCK.tries.delay", 15.0), validate = getOption2("parallelly.makeNodePSOCK.validate", TRUE), verbose = getOption2("parallelly.debug", FALSE)) { localhostHostname <- getOption2("parallelly.localhost.hostname", "localhost") if (is.numeric(workers)) { if (length(workers) != 1L) { stopf("When numeric, argument 'workers' must be a single value: %s", length(workers)) } workers <- as.integer(workers) if (is.na(workers) || workers < 1L) { stopf("Number of 'workers' must be one or greater: %s", workers) } workers <- rep(localhostHostname, times = workers) } tries <- as.integer(tries) stop_if_not(length(tries) == 1L, is.integer(tries), !is.na(tries), tries >= 1L) delay <- as.numeric(delay) stop_if_not(length(delay) == 1L, is.numeric(delay), !is.na(delay), delay >= 0) validate <- as.logical(validate) stop_if_not(length(validate) == 1L, is.logical(validate), !is.na(validate)) ## If we are sure that each node requires a connection, then ... if (identical(makeNode, makeNodePSOCK)) { ## ... can we create that many workers? free <- freeConnections() if (validate) free <- free - 1L if (length(workers) > free) { stopf("Cannot create %d parallel PSOCK nodes. Each node needs one connection but there are only %d connections left out of the maximum %d available on this R installation", length(workers), free, availableConnections()) } } verbose_prefix <- "[local output] " if (verbose) { mdebugf("%sWorkers: [n = %d] %s", verbose_prefix, length(workers), hpaste(sQuote(workers))) } if (length(port) == 0L) { stop("Argument 'port' must be of length one or more: 0") } port <- freePort(port) if (verbose) mdebugf("%sBase port: %d", verbose_prefix, port) n <- length(workers) nodeOptions <- vector("list", length = n) if (verbose) mdebugf("%sGetting setup options for %d cluster nodes ...", verbose_prefix, n) for (ii in seq_len(n)) { if (verbose) mdebugf("%s - Node %d of %d ...", verbose_prefix, ii, n) options <- makeNode(workers[[ii]], port = port, ..., rank = ii, action = "options", verbose = verbose) stop_if_not(inherits(options, "makeNodePSOCKOptions")) nodeOptions[[ii]] <- options } if (verbose) mdebugf("%sGetting setup options for %d cluster nodes ... done", verbose_prefix, n) ## Is a 'parallel' setup strategy requested and possible? setup_strategy <- lapply(nodeOptions, FUN = function(options) { value <- options$setup_strategy if (is.null(value)) value <- "sequential" stop_if_not(is.character(value), length(value) == 1L) value }) setup_strategy <- unlist(setup_strategy, use.names = FALSE) is_parallel <- (setup_strategy == "parallel") force_sequential <- FALSE if (any(is_parallel)) { if (verbose) mdebugf("%s - Parallel setup requested for some PSOCK nodes", verbose_prefix) if (!all(is_parallel)) { if (verbose) mdebugf("%s - Parallel setup requested only for some PSOCK nodes; will revert to a sequential setup for all", verbose_prefix) force_sequential <- TRUE } else { ## Force setup_strategy = "sequential"? affected <- affected_by_bug18119() if (!is.na(affected) && affected) { if (verbose) mdebugf("%s - Parallel setup requested but not supported on this version of R: %s", verbose_prefix, getRversion()) force_sequential <- TRUE } } } if (force_sequential) { ## Force all nodes to be setup using the 'sequential' setup strategy setup_strategy <- "sequential" for (ii in which(is_parallel)) { if (verbose) mdebugf("%s - Node %d of %d ...", verbose_prefix, ii, n) args <- list(workers[[ii]], port = port, ..., rank = ii, action = "options", verbose = verbose) args$setup_strategy <- "sequential" options <- do.call(makeNode, args = args) stop_if_not(inherits(options, "makeNodePSOCKOptions")) nodeOptions[[ii]] <- options } } ## Sanity check setup_strategy <- lapply(nodeOptions, FUN = function(options) { value <- options$setup_strategy if (is.null(value)) value <- "sequential" stop_if_not(is.character(value), length(value) == 1L) value }) setup_strategy <- unlist(setup_strategy, use.names = FALSE) setup_strategy <- unique(setup_strategy) stop_if_not(length(setup_strategy) == 1L) cl <- vector("list", length = length(nodeOptions)) class(cl) <- c("RichSOCKcluster", "SOCKcluster", "cluster") ## If an error occurred, make sure to clean up before exiting, i.e. ## stop each node on.exit({ nodes <- vapply(cl, FUN = inherits, c("SOCKnode", "SOCK0node"), FUN.VALUE = FALSE) stopCluster(cl[nodes]) cl <- NULL }) if (setup_strategy == "parallel") { ## To please R CMD check on R (< 4.0.0) if (getRversion() < "4.0.0") { stopf("Parallel setup of PSOCK cluster nodes is not supported in R %s", getRversion()) socketAccept <- serverSocket <- function(...) NULL } sendCall <- importParallel("sendCall") recvResult <- importParallel("recvResult") ## AD HOC: Use (port, timeout, useXDR) from the options of the first node options <- nodeOptions[[1]] if (verbose) { mdebugf("%sSetting up PSOCK nodes in parallel", verbose_prefix) mstr(options) } port <- options[["port"]] connectTimeout <- options[["connectTimeout"]] timeout <- options[["timeout"]] useXDR <- options[["useXDR"]] nodeClass <- c("RichSOCKnode", if(useXDR) "SOCKnode" else "SOCK0node") cmd <- options[["cmd"]] if (verbose) { mdebugf("%sSystem call to launch all workers:", verbose_prefix) mdebugf("%s%s", verbose_prefix, cmd) } ## FIXME: Add argument, option, environment variable for this ## Start listening and start workers. if (verbose) mdebugf("%sStarting PSOCK main server", verbose_prefix) socket <- serverSocket(port = port) on.exit(if (!is.null(socket)) close(socket), add = TRUE) if (.Platform$OS.type == "windows") { for (ii in seq_along(cl)) { ## See parallel::newPSOCKnode() for the input = "" system(cmd, wait = FALSE, input = "") } } else { ## Asynchronous lists are defined by POSIX cmd <- paste(rep(cmd, times = length(cl)), collapse = " & ") system(cmd, wait = FALSE) } if (verbose) mdebugf("%sWorkers launched", verbose_prefix) ## Accept connections and send the first command as initial ## handshake. The handshake makes TCP synchronization detect and ## err on half-opened connections, which arise during parallel setup ## of client-server connections (due to internal timeouts, limited ## length of the listen backlog queue, race in timing out on ## creating a connection and probably more). ## ## The handshake looks like a regular server command followed by ## client response, which is compatible with older versions of R. ready <- 0L pending <- list() on.exit({ lapply(pending, FUN = function(x) close(x$con)) cl <- NULL }, add = TRUE) if (verbose) mdebugf("%sWaiting for workers to connect back", verbose_prefix) t0 <- Sys.time() while (ready < length(cl)) { if (verbose) mdebugf("%s%d workers out of %d ready", verbose_prefix, ready, length(cl)) cons <- lapply(pending, FUN = function(x) x$con) if (difftime(Sys.time(), t0, units="secs") > connectTimeout + 5) { ## The workers will give up after connectTimeout, so there is ## no point waiting for them much longer. failed <- length(cl) - ready stop(ngettext(failed, "Cluster setup failed. %d worker of %d failed to connect.", "Cluster setup failed. %d of %d workers failed to connect."), failed, length(cl)) } a <- socketSelect(append(list(socket), cons), write = FALSE, timeout = connectTimeout) canAccept <- a[1] canReceive <- seq_along(pending)[a[-1]] if (canAccept) { con <- socketAccept(socket = socket, blocking = TRUE, open = "a+b", timeout = timeout) scon <- structure(list(con = con, host = localhostHostname, rank = ready), class = nodeClass) res <- tryCatch({ sendCall(scon, eval, list(quote(Sys.getpid()))) }, error = identity) pending <- append(pending, list(scon)) } for (scon in pending[canReceive]) { pid <- tryCatch({ recvResult(scon) }, error = identity) if (is.integer(pid)) { ready <- ready + 1L cl[[ready]] <- scon } else { close(scon$con) } } if (length(canReceive) > 0L) pending <- pending[-canReceive] } ## while() } else if (setup_strategy == "sequential") { retryPort <- getOption2("parallelly.makeNodePSOCK.tries.port", "same") for (ii in seq_along(cl)) { if (verbose) { mdebugf("%sCreating node %d of %d ...", verbose_prefix, ii, n) mdebugf("%s- setting up node", verbose_prefix) } options <- nodeOptions[[ii]] for (kk in 1:tries) { if (verbose) { mdebugf("%s- attempt #%d of %d", verbose_prefix, kk, tries) } node <- tryCatch({ makeNode(options, verbose = verbose) }, error = identity) ## Success or an error that is not a connection error? if (!inherits(node, "PSOCKConnectionError")) break if (kk < tries) { if (verbose) { message(conditionMessage(node)) ## Retry with a new random port? if (retryPort == "next") { options$port <- max(options$port + 1L, 65535L) } else if (retryPort == "available") { options$port <- freePort() } mdebugf("%s- waiting %g seconds before trying again", verbose_prefix, delay) } Sys.sleep(delay) } } if (inherits(node, "error")) { ex <- node if (inherits(node, "PSOCKConnectionError")) { if (verbose) { mdebugf("%s Failed %d attempts with %g seconds delay", verbose_prefix, tries, delay) } ex$message <- sprintf("%s\n * Number of attempts: %d (%gs delay)", conditionMessage(ex), tries, delay) } else { ex$call <- sys.call() } stop(ex) } cl[[ii]] <- node if (verbose) { mdebugf("%sCreating node %d of %d ... done", verbose_prefix, ii, n) } } } ## Cleanup try(close(socket), silent = TRUE) socket <- NULL if (validate) { ## Attaching session information for each worker. This is done to assert ## that we have a working cluster already here. It will also collect ## useful information otherwise not available, e.g. the PID. if (verbose) { mdebugf("%s- collecting session information", verbose_prefix) } for (ii in seq_along(cl)) { cl[ii] <- add_cluster_session_info(cl[ii]) } } if (autoStop) cl <- autoStopCluster(cl) ## Success, remove automatic cleanup of nodes on.exit() cl } ## makeClusterPSOCK() #' @param worker The hostname or IP number of the machine where the worker #' should run. #' #' @param master The hostname or IP number of the master / calling machine, as #' known to the workers. If NULL (default), then the default is #' `Sys.info()[["nodename"]]` unless `worker` is _localhost_ or #' `revtunnel = TRUE` in case it is `"localhost"`. #' #' @param connectTimeout The maximum time (in seconds) allowed for each socket #' connection between the master and a worker to be established (defaults to #' 2 minutes). _See note below on current lack of support on Linux and #' macOS systems._ #' #' @param timeout The maximum time (in seconds) allowed to pass without the #' master and a worker communicate with each other (defaults to 30 days). #' #' @param rscript,homogeneous The system command for launching \command{Rscript} #' on the worker and whether it is installed in the same path as the calling #' machine or not. For more details, see below. #' #' @param rscript_args Additional arguments to \command{Rscript} (as a character #' vector). This argument can be used to customize the \R environment of the #' workers before they launches. #' For instance, use `rscript_args = c("-e", shQuote('setwd("/path/to")'))` #' to set the working directory to \file{/path/to} on _all_ workers. #' #' @param rscript_envs A named character vector environment variables to #' set or unset on worker at startup, e.g. #' `rscript_envs = c(FOO = "3.14", "HOME", "UNKNOWN", UNSETME = NA_character_)`. #' If an element is not named, then the value of that variable will be used as #' the name and the value will be the value of `Sys.getenv()` for that #' variable. Non-existing environment variables will be dropped. #' These variables are set using `Sys.setenv()`. #' An named element with value `NA_character_` will cause that variable to be #' unset, which is done via `Sys.unsetenv()`. #' #' @param rscript_libs A character vector of \R library paths that will be #' used for the library search path of the \R workers. An asterisk #' (`"*"`) will be resolved to the default `.libPaths()` _on the #' worker_. That is, to `prepend` a folder, instead of replacing the #' existing ones, use `rscript_libs = c("new_folder", "*")`. #' To pass down a non-default library path currently set _on the main \R #' session_ to the workers, use `rscript_libs = .libPaths()`. #' #' @param rscript_startup An \R expression or a character vector of \R code, #' or a list with a mix of these, that will be evaluated on the \R worker #' prior to launching the worker's event loop. #' For instance, use `rscript_startup = 'setwd("/path/to")'` #' to set the working directory to \file{/path/to} on _all_ workers. #' #' @param rscript_sh The type of shell used where `rscript` is launched, #' which should be `"sh"` is launched via a POSIX shell and `"cmd"` if #' launched via an MS Windows shell. This controls how shell command-line #' options are quoted, via #' \code{\link[base:shQuote]{shQuote(..., type = rscript_sh)}}. #' If `"auto"` (default), and the cluster node is launched locally, then it #' is set to `"sh"` or `"cmd"` according to the current platform. If launched #' remotely, then it is set to `"sh"` based on the assumption remote machines #' typically launch commands via SSH in a POSIX shell. #' #' @param default_packages A character vector or NULL that controls which R #' packages are attached on each cluster node during startup. An asterisk #' (`"*"`) resolves to `getOption("defaultPackages")` _on the current machine_. #' If NULL, then the default set of packages R are attached. #' #' @param methods If TRUE (default), then the \pkg{methods} package is also #' loaded. This is argument exists for legacy reasons due to how #' \command{Rscript} worked in R (< 3.5.0). #' #' @param useXDR If FALSE (default), the communication between master and workers, which is binary, will use small-endian (faster), otherwise big-endian ("XDR"; slower). #' #' @param socketOptions A character string that sets \R option #' \option{socketOptions} on the worker. #' #' @param outfile Where to direct the \link[base:showConnections]{stdout} and #' \link[base:showConnections]{stderr} connection output from the workers. #' If NULL, then no redirection of output is done, which means that the #' output is relayed in the terminal on the local computer. On Windows, the #' output is only relayed when running \R from a terminal but not from a GUI. #' #' @param renice A numerical 'niceness' (priority) to set for the worker #' processes. #' #' @param rank A unique one-based index for each worker (automatically set). #' #' @param rshcmd,rshopts The command (character vector) to be run on the master #' to launch a process on another host and any additional arguments (character #' vector). These arguments are only applied if `machine` is not #' _localhost_. For more details, see below. #' #' @param rshlogfile (optional) If a filename, the output produced by the #' `rshcmd` call is logged to this file, of if TRUE, then it is logged #' to a temporary file. The log file name is available as an attribute #' as part of the return node object. #' _Warning: This only works with SSH clients that support option #' `-E out.log`_. For example, PuTTY's \command{plink} does _not_ support #' this option, and any attempts to specify `rshlogfile` will cause the SSH #' connection to fail. #' #' @param user (optional) The user name to be used when communicating with #' another host. #' #' @param revtunnel If TRUE, a reverse SSH tunnel is set up for each worker such #' that the worker \R process sets up a socket connection to its local port #' `(port - rank + 1)` which then reaches the master on port `port`. #' If FALSE, then the worker will try to connect directly to port `port` on #' `master`. For more details, see below. #' #' @param manual If TRUE the workers will need to be run manually. The command #' to run will be displayed. #' #' @param dryrun If TRUE, nothing is set up, but a message suggesting how to #' launch the worker from the terminal is outputted. This is useful for #' troubleshooting. #' #' @param quiet If TRUE, then no output will be produced other than that from #' using `verbose = TRUE`. #' #' @param setup_strategy If `"parallel"` (default), the workers are set up #' concurrently, one after the other. If `"sequential"`, they are set up #' sequentially. #' #' @param action This is an internal argument. #' #' @return `makeNodePSOCK()` returns a `"SOCKnode"` or #' `"SOCK0node"` object representing an established connection to a worker. #' #' @section Definition of _localhost_: #' A hostname is considered to be _localhost_ if it equals: #' \itemize{ #' \item `"localhost"`, #' \item `"127.0.0.1"`, or #' \item `Sys.info()[["nodename"]]`. #' } #' It is also considered _localhost_ if it appears on the same line #' as the value of `Sys.info()[["nodename"]]` in file \file{/etc/hosts}. #' #' @section Default SSH client and options (arguments `rshcmd` and `rshopts`): #' Arguments `rshcmd` and `rshopts` are only used when connecting #' to an external host. #' #' The default method for connecting to an external host is via SSH and the #' system executable for this is given by argument `rshcmd`. The default #' is given by option \option{parallelly.makeNodePSOCK.rshcmd}. If that is not #' set, then the default is to use \command{ssh} on Unix-like systems, #' including macOS as well as Windows 10. On older MS Windows versions, which #' does not have a built-in \command{ssh} client, the default is to use #' (i) \command{plink} from the \href{https://www.putty.org/}{\command{PuTTY}} #' project, and then (ii) the \command{ssh} client that is distributed with #' RStudio. #' #' PuTTY puts itself on Windows' system \env{PATH} when installed, meaning this #' function will find PuTTY automatically if installed. If not, to manually #' set specify PuTTY as the SSH client, specify the absolute pathname of #' \file{plink.exe} in the first element and option \command{-ssh} in the #' second as in `rshcmd = c("C:/Path/PuTTY/plink.exe", "-ssh")`. #' This is because all elements of `rshcmd` are individually "shell" #' quoted and element `rshcmd[1]` must be on the system \env{PATH}. #' #' Furthermore, when running \R from RStudio on Windows, the \command{ssh} #' client that is distributed with RStudio will also be considered. #' This client, which is from \href{https://osdn.net/projects/mingw/}{MinGW} #' MSYS, is searched for in the folder given by the \env{RSTUDIO_MSYS_SSH} #' environment variable - a variable that is (only) set when running RStudio. #' To use this SSH client outside of RStudio, set \env{RSTUDIO_MSYS_SSH} #' accordingly. #' #' You can override the default set of SSH clients that are searched for #' by specifying them in argument `rshcmd` or via option #' \option{parallelly.makeNodePSOCK.rshcmd} using the format `<...>`, e.g. #' `rshcmd = c("", "", "")`. See #' below for examples. #' #' If no SSH-client is found, an informative error message is produced. #' #' Additional SSH options may be specified via argument `rshopts`, which #' defaults to option \option{parallelly.makeNodePSOCK.rshopts}. For instance, a #' private SSH key can be provided as #' `rshopts = c("-i", "~/.ssh/my_private_key")`. PuTTY users should #' specify a PuTTY PPK file, e.g. #' `rshopts = c("-i", "C:/Users/joe/.ssh/my_keys.ppk")`. #' Contrary to `rshcmd`, elements of `rshopts` are not quoted. #' #' @section Accessing external machines that prompts for a password: #' _IMPORTANT: With one exception, it is not possible to for these #' functions to log in and launch \R workers on external machines that requires #' a password to be entered manually for authentication._ #' The only known exception is the PuTTY client on Windows for which one can #' pass the password via command-line option `-pw`, e.g. #' `rshopts = c("-pw", "MySecretPassword")`. #' #' Note, depending on whether you run \R in a terminal or via a GUI, you might #' not even see the password prompt. It is also likely that you cannot enter #' a password, because the connection is set up via a background system call. #' #' The poor man's workaround for setup that requires a password is to manually #' log into the each of the external machines and launch the \R workers by hand. #' For this approach, use `manual = TRUE` and follow the instructions #' which include cut'n'pasteable commands on how to launch the worker from the #' external machine. #' #' However, a much more convenient and less tedious method is to set up #' key-based SSH authentication between your local machine and the external #' machine(s), as explain below. #' #' @section Accessing external machines via key-based SSH authentication: #' The best approach to automatically launch \R workers on external machines #' over SSH is to set up key-based SSH authentication. This will allow you #' to log into the external machine without have to enter a password. #' #' Key-based SSH authentication is taken care of by the SSH client and not \R. #' To configure this, see the manuals of your SSH client or search the web #' for "ssh key authentication". #' #' @section Reverse SSH tunneling: #' The default is to use reverse SSH tunneling (`revtunnel = TRUE`) for #' workers running on other machines. This avoids the complication of #' otherwise having to configure port forwarding in firewalls, which often #' requires static IP address as well as privileges to edit the firewall #' on your outgoing router, something most users don't have. #' It also has the advantage of not having to know the internal and / or the #' public IP address / hostname of the master. #' Yet another advantage is that there will be no need for a DNS lookup by the #' worker machines to the master, which may not be configured or is disabled #' on some systems, e.g. compute clusters. #' #' @section Argument `rscript`: #' If `homogeneous` is FALSE, the `rscript` defaults to `"Rscript"`, i.e. it #' is assumed that the \command{Rscript} executable is available on the #' \env{PATH} of the worker. #' If `homogeneous` is TRUE, the `rscript` defaults to #' `file.path(R.home("bin"), "Rscript")`, i.e. it is basically assumed that #' the worker and the caller share the same file system and \R installation. #' #' When specified, argument `rscript` should be a character vector with one or #' more elements. Any asterisk (`"*"`) will be resolved to the above default #' `homogeneous`-dependent `Rscript` path. #' All elements are automatically shell quoted using [base::shQuote()], except #' those that are of format `=`, that is, the ones matching the #' regular expression '\samp{^[[:alpha:]_][[:alnum:]_]*=.*}'. #' Another exception is when `rscript` inherits from 'AsIs'. #' #' @section Default value of argument `homogeneous`: #' The default value of `homogeneous` is TRUE if and only if either #' of the following is fulfilled: #' \itemize{ #' \item `worker` is _localhost_ #' \item `revtunnel` is FALSE and `master` is _localhost_ #' \item `worker` is neither an IP number nor a fully qualified domain #' name (FQDN). A hostname is considered to be a FQDN if it contains #' one or more periods #' } #' In all other cases, `homogeneous` defaults to FALSE. #' #' @section Connection time out: #' Argument `connectTimeout` does _not_ work properly on Unix and #' macOS due to limitation in \R itself. For more details on this, please see #' R-devel thread 'BUG?: On Linux setTimeLimit() fails to propagate timeout #' error when it occurs (works on Windows)' on 2016-10-26 #' (\url{https://stat.ethz.ch/pipermail/r-devel/2016-October/073309.html}). #' When used, the timeout will eventually trigger an error, but it won't happen #' until the socket connection timeout `timeout` itself happens. #' #' @section Communication time out: #' If there is no communication between the master and a worker within the #' `timeout` limit, then the corresponding socket connection will be #' closed automatically. This will eventually result in an error in code #' trying to access the connection. #' #' @section Failing to set up local workers: #' When setting up a cluster of localhost workers, that is, workers running #' on the same machine as the master \R process, occasionally a connection #' to a worker ("cluster node") may fail to be set up. #' When this occurs, an informative error message with troubleshooting #' suggestions will be produced. #' The most common reason for such localhost failures is due to port #' clashes. Retrying will often resolve the problem. #' #' @section Failing to set up remote workers: #' A cluster of remote workers runs \R processes on external machines. These #' external \R processes are launched over, typically, SSH to the remote #' machine. For this to work, each of the remote machines needs to have #' \R installed, which preferably is of the same version as what is on the #' main machine. For this to work, it is required that one can SSH to the #' remote machines. Ideally, the SSH connections use authentication based #' on public-private SSH keys such that the set up of the remote workers can #' be fully automated (see above). If `makeClusterPSOCK()` fails to set #' up one or more remote \R workers, then an informative error message is #' produced. #' There are a few reasons for failing to set up remote workers. If this #' happens, start by asserting that you can SSH to the remote machine and #' launch \file{Rscript} by calling something like: #' \preformatted{ #' {local}$ ssh -l alice remote.server.org #' {remote}$ Rscript --version #' R scripting front-end version 3.6.1 (2019-07-05) #' {remote}$ logout #' {local}$ #' } #' When you have confirmed the above to work, then confirm that you can achieve #' the same in a single command-line call; #' \preformatted{ #' {local}$ ssh -l alice remote.server.org Rscript --version #' R scripting front-end version 3.6.1 (2019-07-05) #' {local}$ #' } #' The latter will assert that you have proper startup configuration also for #' _non-interactive_ shell sessions on the remote machine. #' #' Another reason for failing to setup remote workers could be that they are #' running an \R version that is not compatible with the version that your main #' \R session is running. For instance, if we run R (>= 3.6.0) locally and the #' workers run R (< 3.5.0), we will get: #' `Error in unserialize(node$con) : error reading from connection`. #' This is because R (>= 3.6.0) uses serialization format version 3 by default #' whereas R (< 3.5.0) only supports version 2. We can see the version of the #' \R workers by adding `rscript_args = c("-e", shQuote("getRversion()"))` when #' calling `makeClusterPSOCK()`. #' #' @rdname makeClusterPSOCK #' @importFrom tools pskill #' @importFrom utils flush.console #' @export makeNodePSOCK <- function(worker = getOption2("parallelly.localhost.hostname", "localhost"), master = NULL, port, connectTimeout = getOption2("parallelly.makeNodePSOCK.connectTimeout", 2 * 60), timeout = getOption2("parallelly.makeNodePSOCK.timeout", 30 * 24 * 60 * 60), rscript = NULL, homogeneous = NULL, rscript_args = NULL, rscript_envs = NULL, rscript_libs = NULL, rscript_startup = NULL, rscript_sh = c("auto", "cmd", "sh"), default_packages = c("datasets", "utils", "grDevices", "graphics", "stats", if (methods) "methods"), methods = TRUE, socketOptions = getOption2("parallelly.makeNodePSOCK.socketOptions", "no-delay"), useXDR = getOption2("parallelly.makeNodePSOCK.useXDR", FALSE), outfile = "/dev/null", renice = NA_integer_, rshcmd = getOption2("parallelly.makeNodePSOCK.rshcmd", NULL), user = NULL, revtunnel = TRUE, rshlogfile = NULL, rshopts = getOption2("parallelly.makeNodePSOCK.rshopts", NULL), rank = 1L, manual = FALSE, dryrun = FALSE, quiet = FALSE, setup_strategy = getOption2("parallelly.makeNodePSOCK.setup_strategy", "parallel"), action = c("launch", "options"), verbose = FALSE) { verbose <- as.logical(verbose) stop_if_not(length(verbose) == 1L, !is.na(verbose)) if (inherits(worker, "makeNodePSOCKOptions")) { return(launchNodePSOCK(options = worker, verbose = verbose)) } localhostHostname <- getOption2("parallelly.localhost.hostname", "localhost") localMachine <- is.element(worker, c(localhostHostname, "localhost", "127.0.0.1")) ## Could it be that the worker specifies the name of the localhost? ## Note, this approach preserves worker == "127.0.0.1" if that is given. if (!localMachine) { localMachine <- is_localhost(worker) if (localMachine) worker <- getOption2("parallelly.localhost.hostname", "localhost") } attr(worker, "localhost") <- localMachine stop_if_not(is.character(rscript_sh), length(rscript_sh) >= 1L, !anyNA(rscript_sh)) rscript_sh <- rscript_sh[1] if (rscript_sh == "auto") { if (localMachine) { rscript_sh <- if (.Platform$OS.type == "windows") "cmd" else "sh" } else { ## Assume remote machine uses as POSIX shell rscript_sh <- "sh" } } manual <- as.logical(manual) stop_if_not(length(manual) == 1L, !is.na(manual)) dryrun <- as.logical(dryrun) stop_if_not(length(dryrun) == 1L, !is.na(dryrun)) setup_strategy <- match.arg(setup_strategy, choices = c("sequential", "parallel")) quiet <- as.logical(quiet) stop_if_not(length(quiet) == 1L, !is.na(quiet)) ## Locate a default SSH client? if (identical(rshcmd, "")) rshcmd <- NULL if (!is.null(rshcmd)) { rshcmd <- as.character(rshcmd) stop_if_not(length(rshcmd) >= 1L) } if (identical(rshopts, "")) rshopts <- NULL rshopts <- as.character(rshopts) user <- as.character(user) stop_if_not(length(user) <= 1L) port <- as.integer(port) assertPort(port) revtunnel <- as.logical(revtunnel) stop_if_not(length(revtunnel) == 1L, !is.na(revtunnel)) if (!is.null(rshlogfile)) { if (is.logical(rshlogfile)) { stop_if_not(!is.na(rshlogfile)) if (rshlogfile) { rshlogfile <- tempfile(pattern = "parallelly_makeClusterPSOCK_", fileext = ".log") } else { rshlogfile <- NULL } } else { rshlogfile <- as.character(rshlogfile) rshlogfile <- normalizePath(rshlogfile, mustWork = FALSE) } } if (is.null(master)) { if (localMachine || revtunnel) { master <- localhostHostname } else { master <- Sys.info()[["nodename"]] } } stop_if_not(!is.null(master)) timeout <- as.numeric(timeout) stop_if_not(length(timeout) == 1L, !is.na(timeout), is.finite(timeout), timeout >= 0) ## FIXME: This is really legacy code there. It stems from R (< 3.5.0), where ## 'Rscript' did *not* attach the 'methods' package by default, whereas 'R' ## did. Since R 3.5.0, 'R' and 'Rscript' attach the same set of packages. methods <- as.logical(methods) stop_if_not(length(methods) == 1L, !is.na(methods)) if (!is.null(default_packages)) { default_packages <- as.character(default_packages) stop_if_not(!anyNA(default_packages)) is_asterisk <- (default_packages == "*") if (any(is_asterisk)) { pkgs <- getOption("defaultPackages") if (length(pkgs) == 0) { default_packages[!is_asterisk] } else { pkgs <- paste(pkgs, collapse=",") default_packages[is_asterisk] <- pkgs default_packages <- unlist(strsplit(default_packages, split = ",", fixed = TRUE)) } } default_packages <- unique(default_packages) pattern <- sprintf("^%s$", .standard_regexps()$valid_package_name) invalid <- grep(pattern, default_packages, invert = TRUE, value = TRUE) if (length(invalid) > 0) { stop(sprintf("Argument %s specifies invalid package names: %s", sQuote("default_packages"), paste(sQuote(invalid), collapse = ", "))) } } if (is.null(homogeneous)) { homogeneous <- { localMachine || (!revtunnel && is_localhost(master)) || (!is_ip_number(worker) && !is_fqdn(worker)) } } homogeneous <- as.logical(homogeneous) stop_if_not(length(homogeneous) == 1L, !is.na(homogeneous)) ## Is a parallel setup strategy possible? if (setup_strategy == "parallel") { if (getRversion() < "4.0.0" || manual || dryrun || !homogeneous || !localMachine) { setup_strategy <- "sequential" } } bin <- "Rscript" if (homogeneous) bin <- file.path(R.home("bin"), bin) if (is.null(rscript)) { rscript <- bin } else { if (!is.character(rscript)) rscript <- as.character(rscript) stop_if_not(length(rscript) >= 1L) rscript[rscript == "*"] <- bin bin <- rscript[1] if (homogeneous && !inherits(bin, "AsIs")) { bin <- Sys.which(bin) if (bin == "") bin <- normalizePath(rscript[1], mustWork = FALSE) rscript[1] <- bin } } ## Is rscript[1] referring to Rscript, or R/Rterm? name <- sub("[.]exe$", "", basename(bin)) is_Rscript <- (tolower(name) == "rscript") rscript_args <- as.character(rscript_args) if (length(rscript_startup) > 0L) { if (!is.list(rscript_startup)) rscript_startup <- list(rscript_startup) rscript_startup <- lapply(rscript_startup, FUN = function(init) { if (is.language(init)) { init <- deparse(init, width.cutoff = 500L) ## We cannot use newline between statements because ## it needs to be passed as a one line string via -e init <- paste(init, collapse = ";") } init <- as.character(init) if (length(init) == 0L) return(NULL) tryCatch({ parse(text = init) }, error = function(ex) { stopf("Syntax error in argument 'rscript_startup': %s", conditionMessage(ex)) }) init }) rscript_startup <- unlist(rscript_startup, use.names = FALSE) } if (!is.null(rscript_libs)) { rscript_libs <- as.character(rscript_libs) stop_if_not(!anyNA(rscript_libs)) } useXDR <- as.logical(useXDR) stop_if_not(length(useXDR) == 1L, !is.na(useXDR)) if (!is.null(socketOptions)) { stop_if_not(is.character(socketOptions),length(socketOptions) == 1L, !is.na(socketOptions), nzchar(socketOptions)) if (socketOptions == "NULL") socketOptions <- NULL } stop_if_not(is.null(outfile) || is.character(outfile)) renice <- as.integer(renice) stop_if_not(length(renice) == 1L) rank <- as.integer(rank) stop_if_not(length(rank) == 1L, !is.na(rank)) action <- match.arg(action, choices = c("launch", "options")) verbose_prefix <- "[local output] " ## Shell quote the Rscript executable? if (!inherits(rscript, "AsIs")) { idxs <- grep("^[[:alpha:]_][[:alnum:]_]*=.*", rscript, invert = TRUE) rscript[idxs] <- shQuote(rscript[idxs], type = rscript_sh) } rscript_args_internal <- character(0L) ## Can we get the worker's PID during launch? if (localMachine && !dryrun) { res <- useWorkerPID(rscript, rank = rank, rscript_sh = rscript_sh, verbose = verbose) pidfile <- res$pidfile rscript_args_internal <- c(res$rscript_pid_args, rscript_args_internal) } else { pidfile <- NULL } ## Add Rscript "label"? rscript_label <- getOption2("parallelly.makeNodePSOCK.rscript_label", NULL) if (!is.null(rscript_label) && nzchar(rscript_label) && !isFALSE(as.logical(rscript_label))) { if (isTRUE(as.logical(rscript_label))) { script <- grep("[.]R$", commandArgs(), value = TRUE)[1] if (is.na(script)) script <- "UNKNOWN" rscript_label <- sprintf("%s:%s:%s:%s", script, Sys.getpid(), Sys.info()[["nodename"]], Sys.info()[["user"]]) } rscript_args_internal <- c("-e", shQuote(paste0("#label=", rscript_label), type = rscript_sh), rscript_args_internal) } ## In contrast to default_packages=character(0), default_packages = NULL ## skips --default-packages/R_DEFAULT_PACKAGES completely. if (!is.null(default_packages)) { pkgs <- paste(unique(default_packages), collapse = ",") if (is_Rscript) { arg <- sprintf("--default-packages=%s", pkgs) rscript_args_internal <- c(arg, rscript_args_internal) } else { ## FIXME: Make 'rscript_envs' work this way so they are applied sooner ## in the R startup process, instead via -e 'Sys.setenv(FOO="1")'. arg <- sprintf("R_DEFAULT_PACKAGES=%s", pkgs) ## Is the cluster node launched in a MS Windows machine? on_MSWindows <- (rscript_sh %in% c("cmd", "cmd2")) if (on_MSWindows) { ## On MS Windows, we have to use special '/path/to/R FOO=1 ...' rscript_args <- c(arg, rscript_args) } else { ## Everywhere else, we can use 'FOO=1 /path/to/R ...' rscript <- c(arg, rscript) } } } ## Port that the Rscript should use to connect back to the master if (!localMachine && revtunnel && getOption2("parallelly.makeNodePSOCK.port.increment", TRUE)) { rscript_port <- assertPort(port + (rank - 1L)) if (verbose) { mdebugf("%sRscript port: %d + %d = %d\n", verbose_prefix, port, rank-1L, rscript_port) } } else { rscript_port <- port if (verbose) { mdebugf("%sRscript port: %d\n", verbose_prefix, rscript_port) } } if (length(socketOptions) == 1L) { code <- sprintf("options(socketOptions = \"%s\")", socketOptions) rscript_expr <- c("-e", shQuote(code, type = rscript_sh)) rscript_args_internal <- c(rscript_args_internal, rscript_expr) } if (length(rscript_startup) > 0L) { rscript_startup <- paste("invisible({", rscript_startup, "})", sep = "") rscript_startup <- shQuote(rscript_startup, type = rscript_sh) rscript_startup <- lapply(rscript_startup, FUN = function(value) c("-e", value)) rscript_startup <- unlist(rscript_startup, use.names = FALSE) rscript_args_internal <- c(rscript_args_internal, rscript_startup) } if (length(rscript_envs) > 0L) { names <- names(rscript_envs) if (is.null(names)) { copy <- seq_along(rscript_envs) } else { copy <- which(nchar(names) == 0L) } if (length(copy) > 0L) { missing <- NULL for (idx in copy) { name <- rscript_envs[idx] if (!nzchar(name)) { stop("Argument 'rscript_envs' contains an empty non-named environment variable") } value <- Sys.getenv(name, NA_character_) if (!is.na(value)) { rscript_envs[idx] <- value names(rscript_envs)[idx] <- name } else { missing <- c(missing, name) } } if (length(missing) > 0L) { warnf("Did not pass down missing environment variables to cluster node: %s", paste(sQuote(missing), collapse = ", ")) } names <- names(rscript_envs) rscript_envs <- rscript_envs[nzchar(names)] names <- names(rscript_envs) } ## Any environment variables to unset? if (length(unset <- which(is.na(rscript_envs))) > 0L) { names <- names(rscript_envs[unset]) code <- sprintf("\"%s\"", names) code <- paste(code, collapse = ", ") code <- paste0("Sys.unsetenv(c(", code, "))") tryCatch({ parse(text = code) }, error = function(ex) { stopf("Argument 'rscript_envs' appears to contain invalid values: %s", paste(sprintf("%s", sQuote(names)), collapse = ", ")) }) rscript_args_internal <- c(rscript_args_internal, "-e", shQuote(code, type = rscript_sh)) rscript_envs <- rscript_envs[-unset] names <- names(rscript_envs) } ## Any environment variables to set? if (length(names) > 0L) { code <- sprintf('"%s"="%s"', names, rscript_envs) code <- paste(code, collapse = ", ") code <- paste0("Sys.setenv(", code, ")") tryCatch({ parse(text = code) }, error = function(ex) { stopf("Argument 'rscript_envs' appears to contain invalid values: %s", paste(sprintf("%s=%s", sQuote(names), sQuote(rscript_envs)), collapse = ", ")) }) rscript_args_internal <- c(rscript_args_internal, "-e", shQuote(code, type = rscript_sh)) } } if (length(rscript_libs) > 0L) { ## Make sure to preserve backslashes, e.g. in Windows network drives rscript_libs <- gsub("\\\\", "\\\\\\\\", rscript_libs, fixed = TRUE) code <- paste0('"', rscript_libs, '"') code[rscript_libs == "*"] <- ".libPaths()" code <- paste(code, collapse = ",") code <- paste0('.libPaths(c(', code, '))') tryCatch({ parse(text = code) }, error = function(ex) { stopf("Argument 'rscript_libs' appears to contain invalid values: %s", paste(sQuote(rscript_libs), collapse = ", ")) }) rscript_args_internal <- c(rscript_args_internal, "-e", shQuote(code, type = rscript_sh)) } ## .{slave,work}RSOCK() command already specified? if (!any(grepl("parallel:::[.](slave|work)RSOCK[(][)]", rscript_args))) { ## In R (>= 4.1.0), parallel:::.slaveRSOCK() was renamed to .workRSOCK() cmd <- "workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()" rscript_args_internal <- c(rscript_args_internal, "-e", shQuote(cmd, type = rscript_sh)) } ## Append or inject rscript_args_internal? idx <- which(rscript_args == "*") if (length(idx) == 0L) { rscript_args <- c(rscript_args, rscript_args_internal) } else if (length(idx) == 1L) { n <- length(rscript_args) if (idx == 1L) { rscript_args <- c(rscript_args_internal, rscript_args[-1]) } else if (idx == n) { rscript_args <- c(rscript_args[-n], rscript_args_internal) } else { rscript_args <- c(rscript_args[1:(idx-1)], rscript_args_internal, rscript_args[(idx+1):n]) } } else { stop(sprintf("Argument 'rscript_args' may contain at most one asterisk ('*'): %s", paste(sQuote(rscript_args), collapse = " "))) } rscript <- paste(rscript, collapse = " ") rscript_args <- paste(rscript_args, collapse = " ") envvars <- paste0("MASTER=", master, " PORT=", rscript_port, " OUT=", outfile, " TIMEOUT=", timeout, " XDR=", useXDR, " SETUPTIMEOUT=", connectTimeout, " SETUPSTRATEGY=", setup_strategy) cmd <- paste(rscript, rscript_args, envvars) ## Renice? if (!is.na(renice) && renice > 0L) { cmd <- sprintf("nice --adjustment=%d %s", renice, cmd) } if (!localMachine) { ## Find default SSH client find <- is.null(rshcmd) if (find) { which <- NULL if (verbose) { mdebugf("%sWill search for all 'rshcmd' available\n", verbose_prefix) } } else if (all(grepl("^<[a-zA-Z-]+>$", rshcmd))) { find <- TRUE if (verbose) { mdebugf("%sWill search for specified 'rshcmd' types: %s\n", verbose_prefix, paste(sQuote(rshcmd), collapse = ", ")) } which <- gsub("^<([a-zA-Z-]+)>$", "\\1", rshcmd) } if (find) { rshcmd <- find_rshcmd(which = which, must_work = !localMachine && !manual && !dryrun) if (verbose) { s <- unlist(lapply(rshcmd, FUN = function(r) { sprintf("%s [type=%s, version=%s]", paste(sQuote(r), collapse = ", "), sQuote(attr(r, "type")), sQuote(attr(r, "version"))) })) s <- paste(sprintf("%s %d. %s", verbose_prefix, seq_along(s), s), collapse = "\n") mdebugf("%sFound the following available 'rshcmd':\n%s", verbose_prefix, s) } rshcmd <- rshcmd[[1]] } else { if (is.null(attr(rshcmd, "type"))) attr(rshcmd, "type") <- "" if (is.null(attr(rshcmd, "version"))) attr(rshcmd, "version") <- "" } ## Holds a pathname with an optional set of command-line options stop_if_not(is.character(rshcmd), length(rshcmd) >= 1L) s <- sprintf("type=%s, version=%s", sQuote(attr(rshcmd, "type")), sQuote(attr(rshcmd, "version"))) rshcmd_label <- sprintf("%s [%s]", paste(sQuote(rshcmd), collapse = ", "), s) if (verbose) mdebugf("%sUsing 'rshcmd': %s", verbose_prefix, rshcmd_label) ## User? if (length(user) == 1L) rshopts <- c("-l", user, rshopts) ## Reverse tunneling? if (revtunnel) { ## WORKAROUND: The Windows 10 loopback resolution uses IPv6 by default ## and the server is not listening for "localhost". The solution is ## to use "127.0.0.1" instead, or force IPv4 by using ssh option '-4'. ## For more details, see ## https://github.com/PowerShell/Win32-OpenSSH/issues/1265#issuecomment-855234326 for if (is_localhost(master) && .Platform$OS.type == "windows" && ( isTRUE(attr(rshcmd, "OpenSSH_for_Windows")) || basename(rshcmd[1]) == "ssh" )) { master <- "127.0.0.1" } rshopts <- c(sprintf("-R %d:%s:%d", rscript_port, master, port), rshopts) } ## SSH log file? if (is.character(rshlogfile)) { rshopts <- c(sprintf("-E %s", shQuote(rshlogfile)), rshopts) } rshopts <- paste(rshopts, collapse = " ") ## Local commands rsh_call <- paste(paste(shQuote(rshcmd), collapse = " "), rshopts, worker) local_cmd <- paste(rsh_call, shQuote(cmd, type = rscript_sh)) } else { rshcmd_label <- NULL rsh_call <- NULL local_cmd <- cmd } stop_if_not(length(local_cmd) == 1L) options <- structure(list( local_cmd = local_cmd, worker = worker, rank = rank, rshlogfile = rshlogfile, port = port, connectTimeout = connectTimeout, timeout = timeout, useXDR = useXDR, pidfile = pidfile, setup_strategy = setup_strategy, ## For messages, warnings, and errors: outfile = outfile, rshcmd_label = rshcmd_label, rsh_call = rsh_call, cmd = cmd, localMachine = localMachine, manual = manual, dryrun = dryrun, quiet = quiet, rshcmd = rshcmd, revtunnel = revtunnel ), class = c("makeNodePSOCKOptions", "makeNodeOptions")) ## Return options? if (action == "options") return(options) launchNodePSOCK(options, verbose = verbose) } launchNodePSOCK <- function(options, verbose = FALSE) { stop_if_not(inherits(options, "makeNodePSOCKOptions")) local_cmd <- options[["local_cmd"]] worker <- options[["worker"]] rank <- options[["rank"]] rshlogfile <- options[["rshlogfile"]] port <- options[["port"]] connectTimeout <- options[["connectTimeout"]] timeout <- options[["timeout"]] pidfile <- options[["pidfile"]] ## For messages, warnings, and errors"]] useXDR <- options[["useXDR"]] outfile <- options[["outfile"]] rshcmd_label <- options[["rshcmd_label"]] rsh_call <- options[["rsh_call"]] cmd <- options[["cmd"]] localMachine <- options[["localMachine"]] manual <- options[["manual"]] dryrun <- options[["dryrun"]] quiet <- options[["quiet"]] rshcmd <- options[["rshcmd"]] revtunnel <- options[["revtunnel"]] setup_strategy <- options[["setup_strategy"]] if (setup_strategy == "parallel") { stop("INTERNAL ERROR: launchNodePSOCK() called with setup_strategy='parallel', which should never occur") } verbose <- as.logical(verbose) stop_if_not(length(verbose) == 1L, !is.na(verbose)) verbose_prefix <- "[local output] " is_worker_output_visible <- is.null(outfile) if (manual || dryrun) { if (!quiet) { msg <- c("----------------------------------------------------------------------") if (localMachine) { msg <- c(msg, sprintf("Manually, start worker #%s on local machine %s with:", rank, sQuote(worker)), sprintf("\n %s\n", cmd)) } else { msg <- c(msg, sprintf("Manually, (i) login into external machine %s:", sQuote(worker)), sprintf("\n %s\n", rsh_call)) msg <- c(msg, sprintf("and (ii) start worker #%s from there:", rank), sprintf("\n %s\n", cmd)) msg <- c(msg, sprintf("Alternatively, start worker #%s from the local machine by combining both step in a single call:", rank), sprintf("\n %s\n", local_cmd)) } msg <- paste(c(msg, ""), collapse = "\n") cat(msg) flush.console() } if (dryrun) return(NULL) } else { if (verbose) { mdebugf("%sStarting worker #%s on %s: %s", verbose_prefix, rank, sQuote(worker), local_cmd) } input <- if (.Platform$OS.type == "windows") "" else NULL res <- system(local_cmd, wait = FALSE, input = input) if (verbose) { mdebugf("%s- Exit code of system() call: %s", verbose_prefix, res) } if (res != 0) { warnf("system(%s) had a non-zero exit code: %d", local_cmd, res) } } if (verbose) { mdebugf("%sWaiting for worker #%s on %s to connect back", verbose_prefix, rank, sQuote(worker)) if (is_worker_output_visible) { if (.Platform$OS.type == "windows") { mdebugf("%s- Detected 'outfile=NULL' on Windows: this will make the output from the background worker visible when running R from a terminal, but it will most likely not be visible when using a GUI.", verbose_prefix) } else { mdebugf("%s- Detected 'outfile=NULL': this will make the output from the background worker visible", verbose_prefix) } } } con <- local({ ## Apply connection time limit "only to the rest of the current computation". ## NOTE: Regardless of transient = TRUE / FALSE, it still seems we need to ## undo it manually :/ /HB 2016-11-05 setTimeLimit(elapsed = connectTimeout) on.exit(setTimeLimit(elapsed = Inf)) localhostHostname <- getOption2("parallelly.localhost.hostname", "localhost") warnings <- list() tryCatch({ withCallingHandlers({ socketConnection(localhostHostname, port = port, server = TRUE, blocking = TRUE, open = "a+b", timeout = timeout) }, warning = function(w) { if (verbose) { mdebugf("%sDetected a warning from socketConnection(): %s", verbose_prefix, sQuote(conditionMessage(w))) } warnings <<- c(warnings, list(w)) }) }, error = function(ex) { setTimeLimit(elapsed = Inf) ## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ## Post-mortem analysis ## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - machineType <- if (localMachine) "local" else "remote" msg <- sprintf("Failed to launch and connect to R worker on %s machine %s from local machine %s.\n", machineType, sQuote(worker), sQuote(Sys.info()[["nodename"]])) ## Inspect and report on the error message cmsg <- conditionMessage(ex) if (grepl(gettext("reached elapsed time limit"), cmsg)) { msg <- c(msg, sprintf(" * The error produced by socketConnection() was: %s (which suggests that the connection timeout of %.0f seconds (argument 'connectTimeout') kicked in)\n", sQuote(cmsg), connectTimeout)) } else { msg <- c(msg, sprintf(" * The error produced by socketConnection() was: %s\n", sQuote(cmsg))) } ## Inspect and report on any warnings if (length(warnings) > 0) { msg <- c(msg, sprintf(" * In addition, socketConnection() produced %d warning(s):\n", length(warnings))) for (kk in seq_along(warnings)) { cmsg <- conditionMessage(warnings[[kk]]) if (grepl("port [0-9]+ cannot be opened", cmsg)) { msg <- c(msg, sprintf(" - Warning #%d: %s (which suggests that this port is either already occupied by another process or blocked by the firewall on your local machine)\n", kk, sQuote(cmsg))) } else { msg <- c(msg, sprintf(" - Warning #%d: %s\n", kk, sQuote(cmsg))) } } } ## Report on how the local socket connect was setup msg <- c(msg, sprintf(" * The localhost socket connection that failed to connect to the R worker used port %d using a communication timeout of %.0f seconds and a connection timeout of %.0f seconds.\n", port, timeout, connectTimeout)) ## Report on how the worker was launched msg <- c(msg, sprintf(" * Worker launch call: %s.\n", local_cmd)) ## Do we know the PID of the worker? If so, try to kill it to avoid ## leaving a stray process behind ## Comment: readWorkerPID() must be done *after* socketConnection() ## on R 3.4.4, otherwise socketConnection() will fail. Not sure why. ## /HB 2019-01-24 pid <- readWorkerPID(pidfile) if (!is.null(pid)) { if (verbose) mdebugf("Killing worker process (PID %d) if still alive", pid) ## WARNING: pid_kill() calls pid_exists() [twice] and on Windows ## pid_exists() uses system('tasklist') which can be very very slow ## /HB 2019-01-24 success <- pid_kill(pid) if (verbose) mdebugf("Worker (PID %d) was successfully killed: %s", pid, success) msg <- c(msg, sprintf(" * Worker (PID %d) was successfully killed: %s\n", pid, success)) } else if (localMachine) { msg <- c(msg, sprintf(" * Failed to kill local worker because it's PID is could not be identified.\n")) } ## Propose further troubleshooting methods suggestions <- NULL ## Enable verbose=TRUE? if (!verbose) { suggestions <- c(suggestions, "Set 'verbose=TRUE' to see more details.") } ## outfile=NULL? if (.Platform$OS.type == "windows") { if (is_worker_output_visible) { suggestions <- c(suggestions, "On Windows, to see output from worker, set 'outfile=NULL' and run R from a terminal (not a GUI).") } else { suggestions <- c(suggestions, "On Windows, output from worker when using 'outfile=NULL' is only visible when running R from a terminal (not a GUI).") } } else { if (!is_worker_output_visible) { suggestions <- c(suggestions, "Set 'outfile=NULL' to see output from worker.") } } ## Log file? if (is.character(rshlogfile)) { smsg <- sprintf("Inspect the content of log file %s for %s.", sQuote(rshlogfile), paste(sQuote(rshcmd), collapse = " ")) lmsg <- tryCatch(readLines(rshlogfile, n = 15L, warn = FALSE), error = function(ex) NULL) if (length(lmsg) > 0) { lmsg <- sprintf(" %2d: %s", seq_along(lmsg), lmsg) smsg <- sprintf("%s The first %d lines are:\n%s", smsg, length(lmsg), paste(lmsg, collapse = "\n")) } suggestions <- c(suggestions, smsg) } else { suggestions <- c(suggestions, sprintf("Set 'rshlogfile=TRUE' to enable logging for %s.", paste(sQuote(rshcmd), collapse = " "))) } ## Special: Windows 10 ssh client may not support reverse tunneling. /2018-11-10 ## https://github.com/PowerShell/Win32-OpenSSH/issues/1265 if (!localMachine && revtunnel && isTRUE(attr(rshcmd, "OpenSSH_for_Windows"))) { suggestions <- c(suggestions, sprintf("The 'rshcmd' (%s) used may not support reverse tunneling (revtunnel = TRUE). See ?parallelly::makeClusterPSOCK for alternatives.\n", rshcmd_label)) } if (length(suggestions) > 0) { suggestions <- sprintf(" - Suggestion #%d: %s\n", seq_along(suggestions), suggestions) msg <- c(msg, " * Troubleshooting suggestions:\n", suggestions) } msg <- paste(msg, collapse = "") ex$message <- msg ## Re-signal as an PSOCKConnectionError error class(ex) <- c("PSOCKConnectionError", class(ex)) ## Relay error and temporarily avoid truncating the error message ## in case it is too long local({ oopts <- options(warning.length = 2000L) on.exit(options(oopts)) stop(ex) }) }) }) setTimeLimit(elapsed = Inf) if (verbose) { mdebugf("%sConnection with worker #%s on %s established", verbose_prefix, rank, sQuote(worker)) } structure(list(con = con, host = worker, rank = rank, rshlogfile = rshlogfile), class = c("RichSOCKnode", if (useXDR) "SOCKnode" else "SOCK0node")) } ## makeNodePSOCK() ## Checks if a given worker is the same as the localhost. It is, iff: ## ## * worker == "localhost" ## * worker == "127.0.0.1" ## * worker == hostname ## * worker and hostname appears on the same line in /etc/hosts ## ## This should cover cases such as: ## * Calling is_localhost("n3") from machine n3 ## * Calling is_localhost("n3.myserver.org") from machine n3[.myserver.org] ## ## References: ## * https://en.wikipedia.org/wiki/Hostname #' @importFrom utils file_test is_localhost <- local({ localhosts <- c("localhost", "127.0.0.1") non_localhosts <- character(0L) function(worker, hostname = Sys.info()[["nodename"]], pathnames = "/etc/hosts") { ## INTERNAL: Clear list of known local hosts? if (is.null(worker) && is.null(hostname)) { localhosts <<- c("localhost", "127.0.0.1") non_localhosts <<- character(0L) return(NA) } stop_if_not(length(worker) == 1, length(hostname) == 1) ## Already known to a localhost or not to one? if (worker %in% localhosts) return(TRUE) if (worker %in% non_localhosts) return(FALSE) if (worker == hostname) { ## Add worker to the list of known local hosts. localhosts <<- unique(c(localhosts, worker)) return(TRUE) } alias <- getOption2("parallelly.localhost.hostname") if (is.character(alias) && worker == alias) { ## Add worker to the list of known local hosts. localhosts <<- unique(c(localhosts, worker)) return(TRUE) } ## Scan known "hosts" files pathnames <- pathnames[file_test("-f", pathnames)] if (length(pathnames) == 0L) return(FALSE) ## Search for (hostname, worker) and (worker, hostname) ## occuring on the same line and are separates by one or ## more whitespace symbols (but nothing else). pattern <- sprintf("^((|.*[[:space:]])%s[[:space:]]+%s([[:space:]]+|)|(|.*[[:space:]])%s[[:space:]]+%s([[:space:]]+|))$", hostname, worker, worker, hostname) for (pathname in pathnames) { bfr <- readLines(pathname, warn = FALSE) if (any(grepl(pattern, bfr, ignore.case = TRUE))) { ## Add worker to the list of known local hosts. localhosts <<- unique(c(localhosts, worker)) return(TRUE) } } ## Add worker to the list of known non-local hosts. non_localhosts <<- unique(c(non_localhosts, worker)) FALSE } }) ## is_localhost() ## Checks if a worker is specified by its IP number. is_ip_number <- function(worker) { ip <- strsplit(worker, split = ".", fixed = TRUE)[[1]] if (length(ip) != 4) return(FALSE) ip <- as.integer(ip) if (anyNA(ip)) return(FALSE) all(0 <= ip & ip <= 255) } ## Checks if a worker is specified as a fully qualified domain name (FQDN) is_fqdn <- function(worker) { grepl(".", worker, fixed = TRUE) } #' Search for SSH clients on the current system #' #' @param which A character vector specifying which types of SSH clients #' to search for. If NULL, a default set of clients supported by the #' current platform is searched for. #' #' @param first If TRUE, the first client found is returned, otherwise #' all located clients are returned. #' #' @param must_work If TRUE and no clients was found, then an error #' is produced, otherwise only a warning. #' #' @return A named list of pathnames to all located SSH clients. #' The pathnames may be followed by zero or more command-line options, #' i.e. the elements of the returned list are character vectors of length #' one or more. #' If `first = TRUE`, only the first one is returned. #' Attribute `version` contains the output from querying the #' executable for its version (via command-line option `-V`). #' #' @keywords internal find_rshcmd <- function(which = NULL, first = FALSE, must_work = TRUE) { query_version <- function(bin, args = "-V") { v <- suppressWarnings(system2(bin, args = args, stdout = TRUE, stderr = TRUE)) v <- paste(v, collapse = "; ") stop_if_not(length(v) == 1L) v } find_rstudio_ssh <- function() { path <- Sys.getenv("RSTUDIO_MSYS_SSH") if (!file_test("-d", path)) return(NULL) path <- normalizePath(path) path_org <- Sys.getenv("PATH") on.exit(Sys.setenv(PATH = path_org)) ## Set PATH to only look in RSTUDIO_MSYS_SSH to avoid ## picking up other clients with the same name ## Comment: In RStudio, RSTUDIO_MSYS_SSH is appended ## to the PATH, see PATH in 'Tools -> Shell ...'. Sys.setenv(PATH = path) bin <- Sys.which("ssh") if (!nzchar(bin)) return(NULL) attr(bin, "type") <- "rstudio-ssh" attr(bin, "version") <- query_version(bin, args = "-V") bin } find_putty_plink <- function() { bin <- Sys.which("plink") if (!nzchar(bin)) return(NULL) res <- c(bin, "-ssh") attr(res, "type") <- "putty-plink" attr(res, "version") <- query_version(bin, args = "-V") res } find_ssh <- function() { bin <- Sys.which("ssh") if (!nzchar(bin)) return(NULL) attr(bin, "type") <- "ssh" v <- query_version(bin, args = "-V") attr(bin, "version") <- v if (any(grepl("OpenSSH_for_Windows", v))) attr(bin, "OpenSSH_for_Windows") <- TRUE bin } if (!is.null(which)) stop_if_not(is.character(which), length(which) >= 1L, !anyNA(which)) stop_if_not(is.logical(first), length(first) == 1L, !is.na(first)) stop_if_not(is.logical(must_work), length(must_work) == 1L, !is.na(must_work)) if (is.null(which)) { if (.Platform$OS.type == "windows") { which <- c("ssh", "putty-plink", "rstudio-ssh") } else { which <- c("ssh") } } res <- list() for (name in which) { pathname <- switch(name, "ssh" = find_ssh(), "putty-plink" = find_putty_plink(), "rstudio-ssh" = find_rstudio_ssh(), stopf("Unknown 'rshcmd' type: %s", sQuote(name)) ) if (!is.null(pathname)) { if (first) return(pathname) res[[name]] <- pathname } } if (length(res) > 0) return(res) msg <- sprintf("Failed to locate a default SSH client (checked: %s). Please specify one via argument 'rshcmd'.", paste(sQuote(which), collapse = ", ")) #nolint if (must_work) stop(msg) pathname <- "ssh" msg <- sprintf("%s Will still try with %s.", msg, sQuote(paste(pathname, collapse = " "))) warning(msg) pathname } #' @importFrom utils installed.packages session_info <- function(pkgs = getOption2("parallelly.makeNodePSOCK.sessionInfo.pkgs", FALSE)) { libs <- .libPaths() info <- list( r = c(R.version, os.type = .Platform$OS.type), system = as.list(Sys.info()), libs = libs, pkgs = if (isTRUE(pkgs)) { structure(lapply(libs, FUN = function(lib.loc) { pkgs <- installed.packages(lib.loc = lib.loc) if (length(pkgs) == 0) return(NULL) paste0(pkgs[, "Package"], "_", pkgs[, "Version"]) }), names = libs) }, pwd = getwd(), process = list(pid = Sys.getpid()) ) info } #' @importFrom utils capture.output #' @importFrom parallel clusterCall add_cluster_session_info <- local({ get_session_info <- session_info formals(get_session_info)$pkgs <- FALSE environment(get_session_info) <- getNamespace("utils") function(cl) { stop_if_not(inherits(cl, "cluster")) for (ii in seq_along(cl)) { node <- cl[[ii]] if (is.null(node)) next ## Happens with dryrun = TRUE ## Session information already collected? if (!is.null(node$session_info)) next pkgs <- getOption2("parallelly.makeNodePSOCK.sessionInfo.pkgs", FALSE) node$session_info <- clusterCall(cl[ii], fun = get_session_info, pkgs = pkgs)[[1]] ## Sanity check, iff possible if (inherits(node, "SOCK0node") || inherits(node, "SOCKnode")) { pid <- capture.output(print(node)) pid <- as.integer(gsub(".* ", "", pid)) stop_if_not(node$session_info$process$pid == pid) } cl[[ii]] <- node } cl } }) ## add_cluster_session_info() ## Gets the Windows build version, e.g. '10.0.17134.523' (Windows 10 v1803) ## and '10.0.17763.253' (Windows 10 v1809). windows_build_version <- local({ if (.Platform$OS.type != "windows") return(function() NULL) function() { res <- shell("ver", intern = TRUE) if (length(res) == 0) return(NULL) res <- grep("Microsoft", res, value = TRUE) if (length(res) == 0) return(NULL) res <- gsub(".*Version ([0-9.]+).*", "\\1", res) tryCatch({ numeric_version(res) }, error = function(ex) NULL) } }) useWorkerPID <- local({ parent_pid <- NULL .cache <- list() makeResult <- function(rank, rscript_sh) { if (is.null(parent_pid)) parent_pid <<- Sys.getpid() pidfile <- tempfile(pattern = sprintf("worker.rank=%d.parallelly.parent=%d.", rank, parent_pid), fileext = ".pid") pidfile <- normalizePath(pidfile, winslash = "/", mustWork = FALSE) pidcode <- sprintf('try(suppressWarnings(cat(Sys.getpid(),file="%s")), silent = TRUE)', pidfile) rscript_pid_args <- c("-e", shQuote(pidcode, type = rscript_sh)) list(pidfile = pidfile, rscript_pid_args = rscript_pid_args) } function(rscript, rank, rscript_sh, force = FALSE, verbose = FALSE) { autoKill <- getOption2("parallelly.makeNodePSOCK.autoKill", TRUE) if (!isTRUE(as.logical(autoKill))) return(list()) result <- makeResult(rank, rscript_sh = rscript_sh) ## Already cached? key <- paste(rscript, collapse = "\t") if (!force && isTRUE(.cache[[key]])) return(result) test_cmd <- paste(c( rscript, result$rscript_pid_args, "-e", shQuote(sprintf('file.exists("%s")', result$pidfile), type = rscript_sh) ), collapse = " ") if (verbose) { mdebugf("Testing if worker's PID can be inferred: %s", sQuote(test_cmd)) } input <- NULL ## AD HOC: 'singularity exec ... Rscript' requires input="". If not, ## they will be terminated because they try to read from non-existing ## standard input. /HB 2019-02-14 if (any(grepl("singularity", rscript, ignore.case = TRUE))) input <- "" res <- system(test_cmd, intern = TRUE, input = input) status <- attr(res, "status") suppressWarnings(file.remove(result$pidfile)) .cache[[key]] <<- (is.null(status) || status == 0L) && any(grepl("TRUE", res)) if (verbose) mdebugf("- Possible to infer worker's PID: %s", .cache[[key]]) result } }) readWorkerPID <- function(pidfile, wait = 0.5, maxTries = 8L, verbose = FALSE) { if (is.null(pidfile)) return(NULL) if (verbose) mdebug("Attempting to infer PID for worker process ...") pid <- NULL ## Wait for PID file tries <- 0L while (!file.exists(pidfile) && tries <= maxTries) { Sys.sleep(wait) tries <- tries + 1L } if (file.exists(pidfile)) { pid0 <- NULL for (tries in 1:maxTries) { pid0 <- tryCatch(readLines(pidfile, warn = FALSE), error = identity) if (!inherits(pid0, "error")) break pid0 <- NULL Sys.sleep(wait) } file.remove(pidfile) if (length(pid0) > 0L) { ## Use last one, if more than one ("should not happend") pid <- as.integer(pid0[length(pid0)]) if (verbose) mdebugf(" - pid: %s", pid) if (is.na(pid)) { warnf("Worker PID is a non-integer: %s", pid0) pid <- NULL } else if (pid == Sys.getpid()) { warnf("Hmm... worker PID and parent PID are the same: %s", pid) pid <- NULL } } } if (verbose) mdebug("Attempting to infer PID for worker process ... done") pid } ## readWorkerPID() #' @export summary.RichSOCKnode <- function(object, ...) { res <- list( host = NA_character_, r_version = NA_character_, platform = NA_character_, pwd = NA_character_, pid = NA_integer_ ) host <- object[["host"]] if (!is.null(host)) res$host <- host session_info <- object[["session_info"]] if (!is.null(session_info)) { res$r_version <- session_info[["r"]][["version.string"]] res$platform <- session_info[["r"]][["platform"]] res$pwd <- session_info[["pwd"]] res$pid <- session_info[["process"]][["pid"]] } as.data.frame(res, stringsAsFactors = FALSE) } #' @export summary.RichSOCKcluster <- function(object, ...) { res <- lapply(object, FUN = function(node) { if (is.null(node)) return(summary.RichSOCKnode(node)) summary(node) }) res <- do.call(rbind, res) rownames(res) <- NULL res } #' @export print.RichSOCKcluster <- function (x, ...) { info <- summary(x) txt <- sprintf("host %s", sQuote(info[["host"]])) specs <- sprintf("(%s, platform %s)", info[["r_version"]], info[["platform"]]) specs[is.na(info[["r_version"]])] <- "(R version and platform not queried)" txt <- paste(txt, specs, sep = " ") t <- table(txt) t <- t[order(t, decreasing = TRUE)] w <- ifelse(t == 1L, "node is", "nodes are") txt <- sprintf("%d %s on %s", t, w, names(t)) txt <- paste(txt, collapse = ", ") txt <- sprintf("Socket cluster with %d nodes where %s", length(x), txt) ## Report on autoStop? if (!is.null(attr(x, "gcMe"))) { txt <- sprintf("%s. This cluster is registered to be automatically stopped by the garbage collector", txt) } cat(txt, "\n", sep = "") invisible(x) } parallelly/R/as.cluster.R0000644000175000017500000000313414146362545015167 0ustar nileshnilesh#' Coerce an Object to a Cluster Object #' #' @param x An object to be coerced. #' #' @param \dots Additional arguments passed to the underlying coercion method. #' For `c(...)`, the clusters and cluster nodes to be combined. #' #' @return An object of class `cluster`. #' #' @example incl/as.cluster.R #' #' @export as.cluster <- function(x, ...) { UseMethod("as.cluster") } #' @rdname as.cluster #' @export as.cluster.cluster <- function(x, ...) x #' @rdname as.cluster #' @export as.cluster.list <- function(x, ...) { x <- lapply(x, FUN = as.cluster, ...) Reduce(c, x) } #' @rdname as.cluster #' @export as.cluster.SOCKnode <- function(x, ...) { cl <- structure(list(x), class = c("SOCKcluster", "cluster")) } #' @rdname as.cluster #' @export as.cluster.SOCK0node <- as.cluster.SOCKnode #' @param recursive Not used. #' #' @return `c(...)` combine multiple clusters and / or cluster nodes into one #' cluster returned as an of class `cluster`. A warning will be produced if #' there are duplicated nodes in the resulting cluster. #' #' @rdname as.cluster #' @export c.cluster <- function(..., recursive = FALSE) { x <- list(...) x <- lapply(x, FUN = as.cluster) ## AD HOC: Use common demonator class as new class class <- lapply(x, FUN = class) class <- Reduce(intersect, class) stop_if_not(is.element("cluster", class)) ## Combine list of clusters x <- lapply(x, FUN = unclass) x <- Reduce(c, x) ## Assert no duplicates dups <- duplicated(x) if (any(dups)) { warnf("The combined cluster contains %d duplicated nodes", sum(dups)) } class(x) <- class x } parallelly/R/utils.R0000644000175000017500000001346014146362545014247 0ustar nileshnileshisFALSE <- function(x) { is.logical(x) && length(x) == 1L && !is.na(x) && !x } isNA <- function(x) { is.logical(x) && length(x) == 1L && is.na(x) } assert_no_positional_args_but_first <- function(call = sys.call(sys.parent())) { ast <- as.list(call) if (length(ast) <= 2L) return() names <- names(ast[-(1:2)]) if (is.null(names) || any(names == "")) { stopf("Function %s() requires that all arguments beyond the first one are passed by name and not by position: %s", as.character(call[[1L]]), deparse(call, width.cutoff = 100L)) } } stopf <- function(fmt, ..., call. = TRUE, domain = NULL) { #nolint stop(sprintf(fmt, ...), call. = call., domain = domain) } warnf <- function(fmt, ..., call. = TRUE, immediate. = FALSE, domain = NULL) { #nolint warning(sprintf(fmt, ...), call. = call., immediate. = immediate., domain = domain) } msgf <- function(fmt, ..., appendLF = FALSE, domain = NULL) { #nolint message(sprintf(fmt, ...), appendLF = appendLF, domain = domain) } stop_if_not <- function(...) { res <- list(...) for (ii in 1L:length(res)) { res_ii <- .subset2(res, ii) if (length(res_ii) != 1L || is.na(res_ii) || !res_ii) { mc <- match.call() call <- deparse(mc[[ii + 1]], width.cutoff = 60L) if (length(call) > 1L) call <- paste(call[1L], "....") stopf("%s is not TRUE", sQuote(call), call. = FALSE, domain = NA) } } NULL } ## From R.utils 2.0.2 (2015-05-23) hpaste <- function(..., sep = "", collapse = ", ", lastCollapse = NULL, maxHead = if (missing(lastCollapse)) 3 else Inf, maxTail = if (is.finite(maxHead)) 1 else Inf, abbreviate = "...") { if (is.null(lastCollapse)) lastCollapse <- collapse # Build vector 'x' x <- paste(..., sep = sep) n <- length(x) # Nothing todo? if (n == 0) return(x) if (is.null(collapse)) return(x) # Abbreviate? if (n > maxHead + maxTail + 1) { head <- x[seq_len(maxHead)] tail <- rev(rev(x)[seq_len(maxTail)]) x <- c(head, abbreviate, tail) n <- length(x) } if (!is.null(collapse) && n > 1) { if (lastCollapse == collapse) { x <- paste(x, collapse = collapse) } else { xT <- paste(x[1:(n-1)], collapse = collapse) x <- paste(xT, x[n], sep = lastCollapse) } } x } # hpaste() trim <- function(s) { sub("[\t\n\f\r ]+$", "", sub("^[\t\n\f\r ]+", "", s)) } # trim() hexpr <- function(expr, trim = TRUE, collapse = "; ", maxHead = 6L, maxTail = 3L, ...) { code <- deparse(expr) if (trim) code <- trim(code) hpaste(code, collapse = collapse, maxHead = maxHead, maxTail = maxTail, ...) } # hexpr() now <- function(x = Sys.time(), format = "[%H:%M:%OS3] ") { ## format(x, format = format) ## slower format(as.POSIXlt(x, tz = ""), format = format) } mdebug <- function(..., debug = getOption2("parallelly.debug", FALSE)) { if (!debug) return() message(now(), ...) } mdebugf <- function(..., appendLF = TRUE, debug = getOption2("parallelly.debug", FALSE)) { if (!debug) return() message(now(), sprintf(...), appendLF = appendLF) } #' @importFrom utils capture.output mprint <- function(..., appendLF = TRUE, debug = getOption2("parallelly.debug", FALSE)) { if (!debug) return() message(paste(now(), capture.output(print(...)), sep = "", collapse = "\n"), appendLF = appendLF) } #' @importFrom utils capture.output str mstr <- function(..., appendLF = TRUE, debug = getOption2("parallelly.debug", FALSE)) { if (!debug) return() message(paste(now(), capture.output(str(...)), sep = "", collapse = "\n"), appendLF = appendLF) } ## From R.utils 2.7.0 (2018-08-26) queryRCmdCheck <- function(...) { evidences <- list() # Command line arguments args <- commandArgs() evidences[["vanilla"]] <- is.element("--vanilla", args) # Check the working directory pwd <- getwd() dirname <- basename(pwd) parent <- basename(dirname(pwd)) pattern <- ".+[.]Rcheck$" # Is 'R CMD check' checking tests? evidences[["tests"]] <- ( grepl(pattern, parent) && grepl("^tests(|_.*)$", dirname) ) # Is the current working directory as expected? evidences[["pwd"]] <- (evidences[["tests"]] || grepl(pattern, dirname)) # Is 'R CMD check' checking examples? evidences[["examples"]] <- is.element("CheckExEnv", search()) # SPECIAL: win-builder? evidences[["win-builder"]] <- (.Platform$OS.type == "windows" && grepl("Rterm[.]exe$", args[1])) if (evidences[["win-builder"]]) { n <- length(args) if (all(c("--no-save", "--no-restore", "--no-site-file", "--no-init-file") %in% args)) { evidences[["vanilla"]] <- TRUE } if (grepl(pattern, parent)) { evidences[["pwd"]] <- TRUE } } if (!evidences$vanilla || !evidences$pwd) { res <- "notRunning" } else if (evidences$tests) { res <- "checkingTests" } else if (evidences$examples) { res <- "checkingExamples" } else { res <- "notRunning" } attr(res, "evidences") <- evidences res } inRCmdCheck <- function() { queryRCmdCheck() != "notRunning" } comma <- function(x, sep = ", ") paste(x, collapse = sep) commaq <- function(x, sep = ", ") paste(sQuote(x), collapse = sep) ## We are currently importing the following non-exported functions: ## * makeClusterPSOCK(): ## - parallel:::sendCall() ## - parallel:::recvResult() ## * isForkedChild(): ## - parallel:::isChild() importParallel <- local({ ns <- NULL cache <- list() function(name = NULL) { res <- cache[[name]] if (is.null(res)) { ns <<- getNamespace("parallel") if (!exists(name, mode = "function", envir = ns, inherits = FALSE)) { ## covr: skip=3 msg <- sprintf("parallel:::%s() is not available on this system (%s)", name, sQuote(.Platform$OS.type)) mdebug(msg) stop(msg, call. = FALSE) } res <- get(name, mode = "function", envir = ns, inherits = FALSE) cache[[name]] <<- res } res } }) parallelly/R/isLocalhostNode.R0000644000175000017500000000177714025754625016211 0ustar nileshnilesh#' Checks whether or not a Cluster Node Runs on Localhost #' #' @param node A cluster node of class `SOCKnode` or `SOCK0node`. #' #' @param \ldots Not used. #' #' @return (logical) Returns TRUE if the cluster node is running on the #' current machine and FALSE if it runs on another machine. #' If it cannot be inferred, NA is returned. #' #' @export isLocalhostNode <- function(node, ...) UseMethod("isLocalhostNode") #' @export isLocalhostNode.default <- function(node, ...) NA #' @export isLocalhostNode.SOCKnode <- isLocalhostNode.SOCK0node <- function(node, ...) { host <- node$host if (!is.null(host)) return(is_localhost(host)) NextMethod() } #' @export isLocalhostNode.forknode <- function(node, ...) { TRUE } #' @export isLocalhostNode.RichSOCKnode <- function(node, ...) { host <- node$host value <- attr(host, "localhost") if (is.logical(value)) return(value) NextMethod() } #' @export isLocalhostNode.cluster <- function(node, ...) { vapply(node, FUN = isLocalhostNode, FUN.VALUE = NA) } parallelly/R/supportsMulticore.R0000644000175000017500000001003014073770754016664 0ustar nileshnilesh#' Check If Forked Processing ("multicore") is Supported #' #' Certain parallelization methods in R rely on _forked_ processing, e.g. #' `parallel::mclapply()`, `parallel::makeCluster(n, type = "FORK")`, #' `doMC::registerDoMC()`, and `future::plan("multicore")`. #' Process forking is done by the operating system and support for it in #' \R is restricted to Unix-like operating systems such as Linux, Solaris, #' and macOS. R running on Microsoft Windows does not support forked #' processing. #' In R, forked processing is often referred to as "multicore" processing, #' which stems from the 'mc' of the `mclapply()` family of functions, which #' originally was in a package named \pkg{multicore} which later was #' incorporated into the \pkg{parallel} package. #' This function checks whether or not forked (aka "multicore") processing #' is supported in the current \R session. #' #' @param \dots Internal usage only. #' #' @return TRUE if forked processing is supported and not disabled, #' otherwise FALSE. #' #' @section Support for process forking: #' While R supports forked processing on Unix-like operating system such as #' Linux and macOS, it does not on the Microsoft Windows operating system. #' #' For some R environments it is considered unstable to perform parallel #' processing based on _forking_. #' This is for example the case when using RStudio, cf. #' \href{https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011}{RStudio Inc. recommends against using forked processing when running R from within the RStudio software}. #' This function detects when running in such an environment and returns #' `FALSE`, despite the underlying operating system supports forked processing. #' A warning will also be produced informing the user about this the first #' time time this function is called in an \R session. #' This warning can be disabled by setting R option #' \option{parallelly.supportsMulticore.unstable}, or environment variable #' \env{R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE} to `"quiet"`. #' #' @section Enable or disable forked processing: #' It is possible to disable forked processing for futures by setting \R #' option \option{parallelly.fork.enable} to `FALSE`. Alternatively, one can #' set environment variable \env{R_PARALLELLY_FORK_ENABLE} to `false`. #' Analogously, it is possible to override disabled forking by setting one #' of these to `TRUE`. #' #' @examples #' ## Check whether or not forked processing is supported #' supportsMulticore() #' #' @export supportsMulticore <- local({ supportedByOS <- NA function(...) { if (is.na(supportedByOS)) { ns <- getNamespace("parallel") supportedByOS <<- exists("mcparallel", mode = "function", envir = ns, inherits = FALSE) } ## Forked processing is not supported by the OS? if (!supportedByOS) return(FALSE) ## Is forked processing disabled via R settings? value <- getOption2("parallelly.fork.enable", NA) stop_if_not(length(value) == 1L) value <- as.logical(value) if (!is.na(value)) return(value) ## Try to decide whether forked processing is safe or not ## Forked processing should be avoided when R run from RStudio if (!supportsMulticoreAndRStudio(...)) return(FALSE) TRUE } }) supportsMulticoreAndRStudio <- local({ alreadyWarned <- FALSE function(warn = FALSE) { ## Forked processing should be avoided within RStudio ## [https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011] is_rstudio <- (Sys.getenv("RSTUDIO") == "1") if (!is_rstudio) return(TRUE) if (!warn || alreadyWarned) return(FALSE) action <- getOption2("parallelly.supportsMulticore.unstable", "warn") if (action == "warn") { warning("[ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R sessions, see ?parallelly::supportsMulticore") } alreadyWarned <<- TRUE FALSE } }) parallelly/R/isNodeAlive.R0000644000175000017500000000313314116406533015276 0ustar nileshnilesh#' Check whether or not the cluster nodes are alive #' #' @param x A cluster or a cluster node ("worker"). #' #' @param ... Not used. #' #' @return A logical vector of length `length(x)` with values #' FALSE, TRUE, and NA. If it can be established that the #' process for a cluster node is running, then TRUE is returned. #' If it does not run, then FALSE is returned. #' If neither can be inferred, for instance because the worker #' runs on a remote machine, then NA is returned. #' #' @details #' This function works by checking whether the cluster node process is #' running or not. This is done by querying the system for its process #' ID (PID), which is registered by [makeClusterPSOCK()] when the node #' starts. If the PID is not known, the NA is returned. #' On Unix and macOS, the PID is queried using [tools::pskill()] with #' fallback to `system("ps")`. On MS Windows, `system2("tasklist")` is used, #' which may take a long time if there are a lot of processes running. #' For details, see the _internal_ [pid_exists()] function. #' #' @examples #' \donttest{ #' cl <- makeClusterPSOCK(2) #' #' ## Check if cluster nodes #2 is alive #' print(isNodeAlive(cl[[2]])) #' #' ## Check all nodes #' print(isNodeAlive(cl)) #' } #' #' @export isNodeAlive <- function(x, ...) UseMethod("isNodeAlive") #' @export isNodeAlive.default <- function(x, ...) NA #' @export isNodeAlive.RichSOCKnode <- function(x, ...) { pid <- x$session_info$process$pid if (!is.integer(pid)) return(NextMethod()) pid_exists(pid) } #' @export isNodeAlive.cluster <- function(x, ...) { vapply(x, FUN = isNodeAlive, FUN.VALUE = NA) } parallelly/NEWS0000644000175000017500000007600714156551013013257 0ustar nileshnileshPackage: parallelly =================== Version: 1.30.0 [2021-12-15] NEW FEATURES: * makeNodePSOCK(), and therefore also makeClusterPSOCK(), gained argument 'rscript_sh', which controls how Rscript arguments are shell quoted. The default is to make a best guess on what type of shell is used where each cluster node is launched. If launched locally, then it whatever platform the current R session is running, i.e. either a POSIX shell ("sh") or MS Windows ("cmd"). If remotely, then the assumption is that a POSIX shell ("sh") is used. * makeNodePSOCK(), and therefore also makeClusterPSOCK(), gained argument 'default_packages', which controls the default set of R packages to be attached on each cluster node at startup. Moreover, if argument 'rscript' specifies an 'Rscript' executable, then argument 'default_packages' is used to populate Rscript command-line option '--default-packages=...'. If 'rscript' specifies something else, e.g. an 'R' or 'Rterm' executable, then environment variable 'R_DEFAULT_PACKAGES=...' is set accordingly when launching each cluster node. * Argument 'rscript_args' of makeClusterPSOCK() now supports "*" values. When used, the corresponding element will be replaced with the internally added Rscript command-line options. If not specified, such options are appended at the end. BUG FIXES: * makeClusterPSOCK() did not support backslashes ('\') in 'rscript_libs', backslashes that may originate from, for example, Windows network drives. The result was that the worker would silently ignore any 'rscript_libs' components with backslashes. * The package detects when 'R CMD check' runs and adjust default settings via environment variables in order to play nicer with the machine where the checks are running. Some of these environment variables were in this case ignored since parallelly 1.26.0. Version: 1.29.0 [2021-11-20] SIGNIFICANT CHANGES: * makeClusterPSOCK() launches parallel workers with option 'socketOptions' set to "no-delay" by default. This decreases the communication latency between workers and the main R session, significantly so on Unix. This option requires R (>= 4.1.0) and has no effect in early versions of R. NEW FEATURES: * Added argument 'socketOptions' to makeClusterPSOCK(), which sets the corresponding R option on each cluster node when they are launched. * Argument 'rscript_envs' of makeClusterPSOCK() can also be used to unset environment variables cluster nodes. Any named element with value 'NA_character_' will be unset. * Argument 'rscript' of makeClusterPSOCK() now supports "*" values. When used, the corresponding element will be replaced with the "Rscript", or if 'homogenous = TRUE', then absolute path to current "Rscript". DOCUMENTATION: * Add makeClusterPSOCK() example on how to launch workers distributed across multiple CPU Groups on MS Windows 10. BUG FIXES: * isForkedChild() would only return TRUE in a forked child process, if and only if, it had already been called in the parent R process. * Using argument 'rscript_startup' would cause makeClusterPSOCK() to fail in R-devel (>= r80666). Version: 1.28.1 [2021-09-09] CRAN POLICIES: * example("isNodeAlive") now uses \donttest{} to avoid long (> 10 s) elapsed run times on MS Windows. Version: 1.28.0 [2021-08-27] NEW FEATURES: * Add isNodeAlive() to check whether a cluster and cluster nodes are alive or not. * Add isForkedChild() to check whether or not the current R process is a forked child process. BUG FIXES: * Environment variable 'R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE' was incorrectly parsed as a logical instead of a character string. If the variables was set to, say, "quiet", this would cause an error when the package was loaded. * makeClusterPSOCK() failed to fall back to setup_strategy = "sequential", when not supported by the current R version. Version: 1.27.0 [2021-07-19] NEW FEATURES: * availableCores() and availableWorkers() now respects environment variable 'BIOCPARALLEL_WORKER_NUMBER' introduced in BiocParallel (>= 1.27.2). They also respect 'BBS_HOME' which is set on the Bioconductor check servers to limit the number of parallel workers while checking Bioconductor packages. WORKAROUND: * makeClusterPSOCK() and parallel::makeCluster() failed with error "Cluster setup failed. of workers failed to connect." when using the new default 'setup_strategy = "parallel"' and when the 'tcltk' package is loaded when running R (>= 4.0.0 && <= 4.1.0) on macOS. Now 'parallelly' forces setup_strategy = "sequential" when the 'tcltk' package is loaded on these R versions. BUG FIXES: * makeClusterPSOCK(..., setup_strategy = "parallel") would forget to close an socket connection used to set up the workers. This socket connection would be closed by the garbage collector eventually with a warning. * parallelly::makeClusterPSOCK() would fail with "Error in freePort(port) : Unknown value on argument 'port': 'auto'" if environment variable 'R_PARALLEL_PORT' was set to a port number. * parallelly::availableCores() would produce 'Error in if (grepl("^[1-9]$", res)) return(as.integer(res)) : argument is of length zero' on Linux systems without 'nproc' installed. Version: 1.26.1 [2021-06-29] NEW FEATURES: * print() on RichSOCKcluster mentions when the cluster is registered to be automatically stopped by the garbage collector. WORKAROUND: * Depending on R version used, the RStudio Console does not support the new 'setup_strategy = "parallel"' when using makeClusterPSOCK() or parallel::makeCluster(). The symptom is that they, after a long wait, result in "Error in makeClusterPSOCK(workers, ...) : Cluster setup failed. of workers failed to connect." This is due to a bug in R, which has been fixed for R (>= 4.1.1) but also in a recent R 4.1.0 Patched. For R (>= 4.0.0) or R (<= 4.1.0), this release works around the problem by forcing 'setup_strategy = "sequential" for 'parallelly' and 'parallel' when running in the RStudio Console. If you wish to override this behavior, you can always set option 'parallelly.makeNodePSOCK.setup_strategy' to "parallel", e.g. in your ~/.Rprofile file. Alternatively, you can set the environment variable 'R_PARALLELLY_MAKENODEPSOCK_SETUP_STRATEGY=parallel', e.g. in your ~/.Renviron file. BUG FIXES: * On systems with 'nproc' installed, availableCores() would be limited by environment variables 'OMP_NUM_THREADS' and 'OMP_THREAD_LIMIT', if set. For example, on conservative systems that set 'OMP_NUM_THREADS=1' as the default, availableCores() would pick this up via 'nproc' and return 1. This was not the intended behavior. Now those environment variables are temporarily unset before querying 'nproc'. Version: 1.26.0 [2021-06-09] SIGNIFICANT CHANGES: * R_PARALLELLY_* (and R_FUTURE_*) environment variables are now only read when the 'parallelly' package is loaded, where they set the corresponding parallelly.* option. Previously, some of these environment variables were queried by different functions as a fallback to when an option was not set. By only parsing them when the package is loaded, it decrease the overhead in functions, and it clarifies that options can be changed at runtime whereas environment variables should only be set at startup. NEW FEATURES: * makeClusterPSOCK() now support setting up cluster nodes in parallel similarly to how parallel::makePSOCKcluster() does it. This significantly reduces the setup turnaround time. This is only supported in R (>= 4.0.0). To revert to the sequential setup strategy, set R option 'parallelly.makeNodePSOCK.setup_strategy' to "sequential". * Add freePort() to get a random TCP port that can be opened. DOCUMENTATION: * Documenting more R options and environment variables used by this package. BUG FIXES: * R option 'parallelly.availableCores.fallback' and environment variable 'R_PARALLELLY_AVAILABLECORES_FALLBACK' was ignored since parallelly 1.22.0, when support for 'nproc' was added to availableCores(). Version: 1.25.0 [2021-04-30] SIGNIFICANT CHANGES: * The default SSH client on MS Windows 10 is now the built in 'ssh' client. This means that regardless whether you are on Linux, macOS, or Windows 10, setting up parallel workers on external machines over SSH finally works out of the box without having to install PuTTY or other SSH clients. This was possible because a workaround was found for a Windows 10 bug preventing us from using reverse tunneling over SSH. It turns out the bug reveals itself when using hostname 'localhost' but not '127.0.0.1', so we use the latter. NEW FEATURES: * availableCores() gained argument 'omit' to make it easier to put aside zero or more cores from being used in parallel processing. For example, on a system with four cores, availableCores(omit = 1) returns 3. Importantly, since availableCores() is guaranteed to always return a positive integer, availableCores(omit = 4) == 1, even on systems with four or fewer cores. Using availableCores() - 4 on such systems would return a non-positive value, which would give an error downstream. BUG FIXES: * makeClusterPSOCK(), or actually makeNodePSOCK(), did not accept all types of environment variable names when using 'rscript_envs', e.g. it would give an error if we tried to pass '_R_CLASS_MATRIX_ARRAY_'. * makeClusterPSOCK() had a "length > 1 in coercion to logical" bug that could affect especially MS Windows 10 users. Version: 1.24.0 [2021-03-12] SIGNIFICANT CHANGES: * The default SSH client on MS Windows is now, in order of availability: (i) 'plink' of the PuTTY software, (ii) 'ssh' in the RStudio distribution, and (iii) 'ssh' of Windows 10. Previously, the latter was considered first but that still has a bug preventing us from using reverse tunneling. NEW FEATURES: * makeClusterPSOCK(), or actually makeNodePSOCK(), gained argument 'quiet', which can be used to silence output produced by 'manual = TRUE'. * c() for 'cluster' objects now warns about duplicated cluster nodes. * Add isForkedNode() to test if a cluster node runs in a forked process. * Add isLocalhostNode() to test if a cluster node runs on the current machine. * Now availableCores() and availableWorkers() avoid recursive calls to the custom function given by options 'parallelly.availableCores.custom' and 'parallelly.availableWorkers.custom', respectively. * availableWorkers() now recognizes the Slurm environment variable 'SLURM_JOB_NODELIST', e.g. "dev1,n[3-4,095-120]". It will use 'scontrol show hostnames "$SLURM_JOB_NODELIST"' to expand it, if supported on the current machine, otherwise it will attempt to parse and expand the nodelist specification using R. If either of environment variable 'SLURM_JOB_CPUS_PER_NODE' or 'SLURM_TASKS_PER_NODE' is set, then each node in the nodelist will be represented that number of times. If in addition, environment variable 'SLURM_CPUS_PER_TASK' (always a scalar), then that is also respected. MISCELLANEOUS: * All code is now using the 'parallelly.' prefix for options and the 'R_PARALLELLY_' prefix for environment variables. Settings that use the corresponding 'future.' and 'R_FUTURE_' prefixes are still recognized. BUG FIXES: * availableCores() did not respect environment variable 'SLURM_TASKS_PER_NODE' when the job was allocated more than one node. * Above argument 'quiet' was introduced in future 1.19.1 but was mistakenly dropped from parallelly 1.20.0 when that was released, and therefore also from future (>= 1.20.0). Version: 1.23.0 [2021-01-03] NEW FEATURES: * availableCores(), availableWorkers(), and freeCores() gained argument 'logical', which is passed down to parallel::detectCores() as-is. The default is TRUE but it can be changed by setting the R option 'parallelly.availableCores.logical'. This option can in turn be set via environment variable 'R_PARALLELLY_AVAILABLECORES_LOGICAL' which is applied (only) when the package is loaded. * Now makeClusterPSOCK() asserts that there are enough free connections available before attempting to create the parallel workers. If too many workers are requested, an informative error message is produced. * Add availableConnections() and freeConnections() to infer the maximum number of connections that the current R installation can have open at any time and how many of those are currently free to be used. This limit is typically 128 but may be different in custom R installations that are built from source. Version: 1.22.0 [2020-12-12] NEW FEATURES: * Now availableCores() queries also Unix command 'nproc', if available. This will make it respect the number of CPU/cores limited by 'cgroups' and Linux containers. * PSOCK cluster workers are now set up to communicate using little endian (useXDR = FALSE) instead of big endian (useXDR = TRUE). Since most modern systems use little endian, 'useXDR = FALSE' speeds up the communication noticeably (10-15%) on those systems. The default value of this argument can be controlled by the R option 'parallelly.makeNodePSOCK.useXDR' or the corresponding environment variable 'R_PARALLELLY_MAKENODEPSOCK_USEXDR'. BETA FEATURES: * Add cpuLoad() for querying the "average" system load on Unix-like systems. * Add freeCores() for estimating the average number of unused cores based on the average system load as given by cpuLoad(). BUG FIXES: * Except for environment variables 'R_FUTURE_AVAILABLECORES_FALLBACK' and 'R_FUTURE_AVAILABLECORES_SYSTEM', none of the 'R_PARALLELLY_*' and 'R_FUTURE_*' ones where recognized. Version: 1.21.0 [2020-10-26] SIGNIFICANT CHANGES: * Removed find_rshcmd() which was never meant to be exported. NEW FEATURES: * makeClusterPSOCK() gained argument 'validate' to control whether or not the nodes should be tested after they've been created. The validation is done by querying each node for its session information, which is then saved as attribute 'session_info' on the cluster node object. This information is also used in error messages, if available. This validation has been done since version 1.5.0 but now it can be disabled. The default of argument 'validate' can be controlled via an R options and an environment variable. * Now makeNodePSOCK(..., rscript_envs = "UNKNOWN") produces an informative warning on non-existing environment variables that was skipped. BUG FIXES: * makeClusterPSOCK() would produce an error on 'one node produced an error: could not find function "getOptionOrEnvVar"' if 'parallelly' is not available on the node. * makeClusterPSOCK() would attempt to loaded 'parallelly' on the worker. If it's not available on the worker, it would result in a silent warning on the worker. Now 'parallelly' is not loaded. * makeClusterPSOCK(..., tries = n) would retry to setup a cluster node also on errors that were unrelated to node setup or node connection errors. * The error message on using an invalid 'rscript_envs' argument for makeClusterPSOCK() reported on the value of 'rscript_libs' (sic!). * makeNodePSOCK(..., rscript_envs = "UNKNOWN") would result in an error when trying to launch the cluster node. DEPRECATED AND DEFUNCT: * Removed find_rshcmd() which was never meant to be exported. Version: 1.20.0 [2020-10-10] SIGNIFICANT CHANGES: * Add availableCores(), and availableWorkers(), supportsMulticore(), as.cluster(), autoStopCluster(), makeClusterMPI(), makeClusterPSOCK(), and makeNodePSOCK() from the 'future' package. NEW FEATURES: * Add isConnectionValid() and connectionId() adopted from internal code of the 'future' package. BUG FIXES: * Renamed environment variable 'R_FUTURE_MAKENODEPSOCK_tries' used by makeClusterPSOCK() to 'R_FUTURE_MAKENODEPSOCK_TRIES'. * connectionId() did not return -1L on Solaris for connections with internal 'nil' pointers because they were reported as '0' - not 'nil' or '0x0'. HISTORY: * Below is an excerpt of the future's NEWS entries that are related to the functions in this package. Version: 1.19.0 [2020-09-19] SIGNIFICANT CHANGES: * Now availableCores() better supports Slurm. Specifically, if environment variable 'SLURM_CPUS_PER_TASK' is not set, which requires that option --slurm-cpus-per-task=n' is specified and SLURM_JOB_NUM_NODES=1, then it falls back to using 'SLURM_CPUS_ON_NODE', e.g. when using '--ntasks=n'. * Now availableCores() and availableWorkers() supports LSF/OpenLava. Specifically, they acknowledge environment variable 'LSB_DJOB_NUMPROC' and 'LSB_HOSTS', respectively. NEW FEATURES: * makeClusterPSOCK() will now retry to create a cluster node up to 'tries' (default: 3) times before giving up. If argument 'port' species more than one port (e.g. port = "random") then it will also attempt find a valid random port up to 'tries' times before giving up. The pre-validation of the random port is only supported in R (>= 4.0.0) and skipped otherwise. * makeClusterPSOCK() skips shell quoting of the elements in 'rscript' if it inherits from 'AsIs'. * makeClusterPSOCK(), or actually makeNodePSOCK(), gained argument 'quiet', which can be used to silence output produced by 'manual = TRUE'. PERFORMANCE: * Now plan(multisession), plan(cluster, workers = ), and makeClusterPSOCK() which they both use internally, sets up localhost workers twice as fast compared to versions since future 1.12.0, which brings it back to par with a bare-bone parallel::makeCluster(..., setup_strategy = "sequential") setup. The slowdown was introduced in future 1.12.0 (2019-03-07) when protection against leaving stray R processes behind from failed worker startup was implemented. This protection now makes use of memoization for speedup. Version: 1.18.0 [2020-07-08] NEW FEATURES: * print() on RichSOCKcluster gives information not only on the name of the host but also on the version of R and the platform of each node ("worker"), e.g. "Socket cluster with 3 nodes where 2 nodes are on host 'localhost' (R version 4.0.0 (2020-04-24), platform x86_64-w64-mingw32), 1 node is on host 'n3' (R version 3.6.3 (2020-02-29), platform x86_64-pc-linux-gnu)". * It is now possible to set environment variables on workers before they are launched by makeClusterPSOCK() by specify them as as "=" as part of the 'rscript' vector argument, e.g. rscript=c("ABC=123", "DEF='hello world'", "Rscript"). This works because elements in 'rscript' that match regular expression '^[[:alpha:]_][[:alnum:]_]*=.*' are no longer shell quoted. * makeClusterPSOCK() now returns a cluster that in addition to inheriting from' SOCKcluster' it will also inherit from 'RichSOCKcluster'. BUG FIXES: * Made makeClusterPSOCK() and makeNodePSOCK() agile to the name change from parallel:::.slaveRSOCK() to parallel:::.workRSOCK() in R (>= 4.1.0). * makeClusterPSOCK(..., rscript) will not try to locate rscript[1] if argument 'homogeneous' is FALSE (or inferred to be FALSE). * makeClusterPSOCK(..., rscript_envs) would result in a syntax error when starting the workers due to non-ASCII quotation marks if option 'useFancyQuotes' was not set to FALSE. Version: 1.17.0 [2020-04-17] NEW FEATURES: * makeClusterPSOCK() gained argument 'rscript_envs' for setting environment variables in workers on startup, e.g. rscript_envs = c(FOO = "3.14", "BAR"). MISCELLANEOUS: * Not all CRAN servers have _R_CHECK_LIMIT_CORES_ set. To better emulate CRAN submission checks, the future package will, when loaded, set this environment variable to 'TRUE' if unset and if 'R CMD check' is running. Note that future::availableCores() respects _R_CHECK_LIMIT_CORES_ and returns at most 2L (two cores) if detected. Version: 1.15.1 [2019-11-23] NEW FEATURES: * The default range of ports that makeClusterPSOCK() draws a random port from (when argument 'port' is not specified) can now be controlled by environment variable 'R_FUTURE_RANDOM_PORTS'. The default range is still 11000:11999 as with the 'parallel' package. Version: 1.15.0 [2019-11-07] DOCUMENTATION: * Added 'Troubleshooting' section to ?makeClusterPSOCK with instructions on how to troubleshoot when the setup of local and remote clusters fail. BUG FIXES: * makeClusterPSOCK() could produce warnings like "cannot open file '/tmp/alice/Rtmpi69yYF/future.parent=2622.a3e32bc6af7.pid': No such file", e.g. when launching R workers running in Docker containers. * makeClusterMPI() did not work for MPI clusters with 'comm' other than '1'. Version: 1.13.0 [2019-05-08] NEW FEATURES: * Now availableCores() also recognizes PBS environment variable 'NCPUS', because the PBSPro scheduler does not set 'PBS_NUM_PPN'. * If, option 'future.availableCores.custom' is set to a function, then availableCores() will call that function and interpret its value as number of cores. Analogously, option 'future.availableWorkers.custom' can be used to specify a hostnames of a set of workers that availableWorkers() sees. These new options provide a mechanism for anyone to customize availableCores() and availableWorkers() in case they do not (yet) recognize, say, environment variables that are specific the user's compute environment or HPC scheduler. * makeClusterPSOCK() gained support for argument 'rscript_startup' for evaluating one or more R expressions in the background R worker prior to the worker event loop launching. This provides a more convenient approach than having to use, say, 'rscript_args = c("-e", sQuote(code))'. * makeClusterPSOCK() gained support for argument 'rscript_libs' to control the R package library search path on the workers. For example, to _prepend_ the folder '~/R-libs' on the workers, use 'rscript_libs = c("~/R-libs", "*")', where "*" will be resolved to the current '.libPaths()' on the workers. BUG FIXES: * makeClusterPSOCK() did not shell quote the Rscript executable when running its pre-tests checking whether localhost Rscript processes can be killed by their PIDs or not. Version: 1.12.0 [2019-03-07] NEW FEATURES: * If makeClusterPSOCK() fails to create one of many nodes, then it will attempt to stop any nodes that were successfully created. This lowers the risk for leaving R worker processes behind. BUG FIXES: * makeClusterPSOCK() in future (>= 1.11.1) produced warnings when argument 'rscript' had length(rscript) > 1. Version: 1.11.1.1 [2019-01-25] BUG FIXES: * When makeClusterPSOCK() fails to connect to a worker, it produces an error with detailed information on what could have happened. In rare cases, another error could be produced when generating the information on what the workers PID is. Version: 1.11.1 [2019-01-25] NEW FEATURES: * The defaults of several arguments of makeClusterPSOCK() and makeNodePSOCK() can now be controlled via environment variables in addition to R options that was supported in the past. An advantage of using environment variables is that they will be inherited by child processes, also nested ones. SOFTWARE QUALITY: * TESTS: When the 'future' package is loaded, it checks whether 'R CMD check' is running or not. If it is, then a few future-specific environment variables are adjusted such that the tests play nice with the testing environment. For instance, it sets the socket connection timeout for PSOCK cluster workers to 120 seconds (instead of the default 30 days!). This will lower the risk for more and more zombie worker processes cluttering up the test machine (e.g. CRAN servers) in case a worker process is left behind despite the main R processes is terminated. Note that these adjustments are applied automatically to the checks of any package that depends on, or imports, the 'future' package. BUG FIXES: * Whenever makeClusterPSOCK() would fail to connect to a worker, for instance due to a port clash, then it would leave the R worker process running - also after the main R process terminated. When the worker is running on the same machine, makeClusterPSOCK() will now attempt to kill such stray R processes. Note that parallel::makePSOCKcluster() still has this problem. Version: 1.11.0 [2019-01-21] NEW FEATURES: * makeClusterPSOCK() produces more informative error messages whenever the setup of R workers fails. Also, its verbose messages are now prefixed with "[local output] " to help distinguish the output produced by the current R session from that produced by background workers. * It is now possible to specify what type of SSH clients makeClusterPSOCK() automatically searches for and in what order, e.g. 'rshcmd = c("", "")'. * Now makeClusterPSOCK() preserves the global RNG state (.Random.seed) also when it draws a random port number. * makeClusterPSOCK() gained argument 'rshlogfile'. BUG FIXES: * makeClusterPSOCK(..., rscript = "my_r") would in some cases fail to find the intended 'my_r' executable. Version: 1.10.0 [2018-10-16] NEW FEATURES: * Add makeClusterMPI(n) for creating MPI-based clusters of a similar kind as parallel::makeCluster(n, type = "MPI") but that also attempts to workaround issues where parallel::stopCluster() causes R to stall. * makeClusterPSOCK() and makeClusterMPI() gained argument 'autoStop' for controlling whether the cluster should be automatically stopped when garbage collected or not. Version: 1.9.0 [2018-07-22] BUG FIXES: * makeClusterPSOCK() produced a warning when environment variable 'R_PARALLEL_PORT' was set to 'random' (e.g. as on CRAN). Version: 1.8.1 [2018-05-02] NEW FEATURES: * makeClusterPSOCK() now produces a more informative warning if environment variable R_PARALLEL_PORT specifies a non-numeric port. Version: 1.7.0 [2018-02-10] NEW FEATURES: * On Windows, makeClusterPSOCK(), and therefore plan(multisession) and plan(multiprocess), will use the SSH client distributed with RStudio as a fallback if neither 'ssh' nor 'plink' is available on the system PATH. BUG FIXES: * makeClusterPSOCK(..., renice = 19) would launch each PSOCK worker via 'nice +19' resulting in the error "nice: '+19': No such file or directory". This bug was inherited from parallel::makePSOCKcluster(). Now using 'nice --adjustment=19' instead. Version: 1.5.0 [2017-05-24] NEW FEATURES: * makeClusterPSOCK() now defaults to use the Windows PuTTY software's SSH client 'plink -ssh', if 'ssh' is not found. * Argument 'homogeneous' of makeNodePSOCK(), a helper function of makeClusterPSOCK(), will default to FALSE also if the hostname is a fully qualified domain name (FQDN), that is, it "contains periods". For instance, c('node1', 'node2.server.org') will use homogeneous = TRUE for the first worker and homogeneous = FALSE for the second. * makeClusterPSOCK() now asserts that each cluster node is functioning by retrieving and recording the node's session information including the process ID of the corresponding R process. DOCUMENTATION: * Help on makeClusterPSOCK() gained more detailed descriptions on arguments and what their defaults are. Version: 1.4.0 [2017-03-12] NEW FEATURES: * The default values for arguments 'connectTimeout' and 'timeout' of makeNodePSOCK() can now be controlled via global options. DEPRECATED AND DEFUNCT: * availableCores(method = "mc.cores") is now defunct in favor of "mc.cores+1". Version: 1.3.0 [2017-01-18] NEW FEATURES: * makeClusterPSOCK() treats workers that refer to a local machine by its local or canonical hostname as "localhost". This avoids having to launch such workers over SSH, which may not be supported on all systems / compute cluster. * Added availableWorkers(). By default it returns localhost workers according to availableCores(). In addition, it detects common HPC allocations given in environment variables set by the HPC scheduler. * Option 'future.availableCores.fallback', which defaults to environment variable 'R_FUTURE_AVAILABLECORES_FALLBACK' can now be used to specify the default number of cores / workers returned by availableCores() and availableWorkers() when no other settings are available. For instance, if R_FUTURE_AVAILABLECORES_FALLBACK=1 is set system wide in an HPC environment, then all R processes that uses availableCores() to detect how many cores can be used will run as single-core processes. Without this fallback setting, and without other core-specifying settings, the default will be to use all cores on the machine, which does not play well on multi-user systems. BUG FIXES: * Creation of cluster futures (including multisession ones) would time out already after 40 seconds if all workers were busy. New default timeout is 30 days (option 'future.wait.timeout'). * availableCores(methods = "_R_CHECK_LIMIT_CORES_") would give an error if not running R CMD check. Version: 1.2.0 [2016-11-12] NEW FEATURES: * Added makeClusterPSOCK() - a version of parallel::makePSOCKcluster() that allows for more flexible control of how PSOCK cluster workers are set up and how they are launched and communicated with if running on external machines. * Added generic as.cluster() for coercing objects to cluster objects to be used as in plan(cluster, workers = as.cluster(x)). Also added a c() implementation for cluster objects such that multiple cluster objects can be combined into a single one. BUG FIXES: * Argument 'user' to remote() was ignored (since 1.1.0). Version: 1.1.1 [2016-10-10] BUG FIXES: * For the special case where 'remote' futures use workers = "localhost" they (again) use the exact same R executable as the main / calling R session (in all other cases it uses whatever 'Rscript' is found in the PATH). This was already indeed implemented in 1.0.1, but with the added support for reverse SSH tunnels in 1.1.0 this default behavior was lost. Version: 1.1.0 [2016-10-09] NEW FEATURES: * REMOTE CLUSTERS: It is now very simple to use cluster() and remote() to connect to remote clusters / machines. As long as you can connect via ssh to those machines, it works also with these future. The new code completely avoids incoming firewall and incoming port forwarding issues previously needed. This is done by using reverse SSH tunneling. There is also no need to worry about internal or external IP numbers. Version: 0.15.0 [2016-06-13] NEW FEATURES: * Now availableCores() also acknowledges environment variable NSLOTS set by Sun/Oracle Grid Engine (SGE). Version: 0.12.0 [2016-02-23] BUG FIXES: * FIX: Now availableCores() returns 3L (=2L+1L) instead of 2L if _R_CHECK_LIMIT_CORES_ is set. Version: 0.10.0 [2015-12-30] NEW FEATURES: * Now availableCores() also acknowledges the number of CPUs allotted by Slurm. Version: 0.8.0 [2015-09-06] NEW FEATURES: * availableCores("mc.cores") returns getOption("mc.cores") + 1L, because option 'mc.cores' specifies "allowed number of _additional_ R processes" to be used in addition to the main R process. parallelly/inst/0000755000175000017500000000000014156551340013526 5ustar nileshnileshparallelly/inst/WORDLIST0000644000175000017500000000216514156253003014717 0ustar nileshnileshAppVeyor AsIs BiocParallel cmd CMD cut'n'pasteable DNS finalizer FQDN hostname hostnames hpc LibreSSL macOS MinGW mpi MSYS OpenSSH PID PIDs PPK pre Pre PSOCK PuTTY Rmpi Roadmap Rslaves RStudio setTimeLimit sig tasklist XDR autoStopCluster makeClusterMPI makeClusterPSOCK makeNodePSOCK parallelly af alice alnum args autoStop bc connectTimeout envs libPaths libs linux makeCluster makePSOCKcluster memoization mingw multiprocess multisession pc pid plink renice RichSOCKcluster rscript Rscript rshcmd rshlogfile rstudio Rtmpi slaveRSOCK SOCKcluster sQuote stopCluster tmp useFancyQuotes workRSOCK yYF supportsMulticore availableCores availableWorkers cpus DJOB getOption LSB LSF mc NCPUS NSLOTS ntasks NUM NUMPROC OpenLava PBSPro PPN SGE slurm Slurm config parallelization PuTTY's todo detectCores env sysadm connectionId isConnectionValid GetConnection stderr stdin stdout md UUID getOptionOrEnvVar useXDR cgroups nproc cpuLoad freeCores availableConnections freeConnections isForkedNode isLocalhostNode nodelist scontrol dev freePort Rprofile Renviron OMP grepl tcltk github HenrikBengtsson isForkedChild isNodeAlive socketOptions donttest parallelly/NAMESPACE0000644000175000017500000000270614156551105013774 0ustar nileshnilesh# Generated by roxygen2: do not edit by hand S3method(as.cluster,SOCK0node) S3method(as.cluster,SOCKnode) S3method(as.cluster,cluster) S3method(as.cluster,list) S3method(c,cluster) S3method(isForkedNode,cluster) S3method(isForkedNode,default) S3method(isForkedNode,forknode) S3method(isLocalhostNode,RichSOCKnode) S3method(isLocalhostNode,SOCKnode) S3method(isLocalhostNode,cluster) S3method(isLocalhostNode,default) S3method(isLocalhostNode,forknode) S3method(isNodeAlive,RichSOCKnode) S3method(isNodeAlive,cluster) S3method(isNodeAlive,default) S3method(print,RichSOCKcluster) S3method(stopCluster,RichMPIcluster) S3method(summary,RichSOCKcluster) S3method(summary,RichSOCKnode) export(as.cluster) export(autoStopCluster) export(availableConnections) export(availableCores) export(availableWorkers) export(connectionId) export(cpuLoad) export(freeConnections) export(freeCores) export(freePort) export(isConnectionValid) export(isForkedChild) export(isForkedNode) export(isLocalhostNode) export(isNodeAlive) export(makeClusterMPI) export(makeClusterPSOCK) export(makeNodePSOCK) export(supportsMulticore) importFrom(parallel,clusterCall) importFrom(parallel,detectCores) importFrom(parallel,makeCluster) importFrom(parallel,stopCluster) importFrom(tools,pskill) importFrom(utils,capture.output) importFrom(utils,file_test) importFrom(utils,flush.console) importFrom(utils,installed.packages) importFrom(utils,packageVersion) importFrom(utils,read.table) importFrom(utils,str)