How to simulate a Ray cluster on a single machine

Sometimes, you need to test a multi-node Ray script in your CI without actually standing up a multi-node cluster every time you run your CI. In some cases, it's sufficient to simulate a multi-node cluster by creating multiple Ray processes on the same machine, and Ray will treat the different processes as if they were separate nodes.

Here's a script demonstrating this. We set up a "head node" at port 6379, and then connected two "worker nodes" to that head node. The trick is to use different ports for each process.

#!/usr/bin/env bash

# Enable local clusters on Windows and macOS
export RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1
# Silence a warning
export RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0

# Cleanup function to stop the cluster and exit the script
cleanup() {
  ray stop >/dev/null 2>&1 || true
  echo "Cluster stopped. Done."
}
trap cleanup EXIT

echo "Starting head node on port 6379..."
ray start --head --port=6379 --node-manager-port=63000 --object-manager-port=63001 --min-worker-port=30000 --max-worker-port=30099 --num-cpus=0 --temp-dir=/tmp/ray/head-node
echo "Head node started"

echo "Starting worker node A"
ray start --address=127.0.0.1:6379 --node-manager-port=63010 --object-manager-port=63011 --min-worker-port=30100 --max-worker-port=30199 --num-cpus=1
echo "Worker node A connected to the head node"

echo "Starting worker node B"
ray start --address=127.0.0.1:6379 --node-manager-port=63020 --object-manager-port=63021 --min-worker-port=30200 --max-worker-port=30299 --num-cpus=1
echo "Worker node B connected to the head node"

echo "Cluster started! Testing..."
python -c "import ray; ray.init(address='127.0.0.1:6379'); print(f'Number of nodes: {len(ray.nodes())}')"
# Number of nodes: 3

Copyright Ricardo Decal. ricardodecal.com