The crew
package has unavoidable risks, and the user is
responsible for safety, security, and computational resources. This
vignette describes known risks and safeguards, but is by no means
exhaustive. Please read the software
license.
The crew
package launches external R processes:
mirai
dispatcher, an R process which send tasks to workers and retrieves
results back. If x
is a crew
controller, a
ps::ps_handle()
process handle of the dispatcher is
retained in x$client$dispatcher
.In the event of a poorly-timed crash or network error, these
processes may not terminate properly. If that happens, they will
continue to run, which may strain traditional clusters or incur heavy
expenses on the cloud. Please monitor the platforms you use and manually
terminate defunct hanging processes as needed. To list and terminate
local processes, please use crew_monitor_local()
as
explained in the introduction vignette. To manage and monitor non-local
high-performance computing workers such as those on SLURM and AWS Batch,
please familiarize yourself with the given computing platform, and
consider using the monitor objects in the relevant third-party plugin
packages such as crew.cluster
or crew.aws.batch
.
Example: https://wlandau.github.io/crew.aws.batch/index.html#job-management.
The local process or mirai
dispatcher process could crash. A common cause of crashes is running out
of computer memory. The “Resources” section of the introduction
explains how to monitor memory usage. If you are running
crew
in a targets
pipeline (as explained here in the
targets
user manual), consider setting
storage = "worker"
and retrieval = "worker
in
tar_option_set()
to minimize memory consumption of the
local processes (see also the performance
chapter).
In addition, crew
worker processes may crash silently at
runtime, or they may fail to launch or connect at all. The reasons may
be platform-specific, but here are some common possibilities:
crew.aws.batch
and crew.cluster
expose special platform-specific parameters in the controllers to do
this.In addition, crew
occupies one TCP port per controller.
TCP ports range from 0 to 65535, and only around 16000 of these ports
are considered ephemeral or dynamic, so please be careful not to run too
many controllers simultaneously on shared machines, especially in controller
group. The terminate()
frees these ports again for
other processes to use.
By default, crew
uses unencrypted TCP connections for
transactions among workers. In a compromised network, an attacker can
read the data in transit, and even gain direct access to the client or
host.
It is best to avoid persistent direct connections between your local
computer and the public internet. The host
argument of the
controller should not be a public IP address. Instead, please try to
operate entirely within a perimeter such as a firewall, a virtual
private network (VPN), or an Amazon Web Services (AWS) security group.
In the case of AWS, your security group can open ports to itself. That
way, the crew
workers on e.g. AWS Batch jobs can connect to
a crew
client running in the same security group on an AWS
Batch job or EC2 instance.
In the age of Zero Trust, perimeters alone are seldom sufficient. Transport layer security (TLS) encrypts data to protect it from hackers while it travels over a network. TLS is the state of the art of encryption for network communications, and it is responsible for security in popular protocols such as HTTPS and SSH. TLS is based on public key cryptography, which requires two files:
To use TLS in crew
with automatic configuration, simply
set tls = crew_tls(mode = "automatic")
in the controller,
e.g. crew_controller_local()
.1 mirai
generates a one-time key pair and encrypts data for the current
crew
client. The key pair expires when the client
terminates, which reduces the risk of a breach. In addition, the public
key is a self-signed certificate, which somewhat protects against
tampering on its way from the client to the server.
Launcher
plugins should expose the tls
argument of
crew_client()
.↩︎