How to simulate a Ray cluster on a single machine

#ray-distributed #guides #python #CI

Sometimes, you need to test a multi-node Ray script in your CI without actually standing up a multi-node cluster every time you run your CI. In some cases, it's sufficient to simulate a multi-node cluster by creating multiple Ray processes on the same machine, and Ray will treat the different processes as if they were separate nodes.

Here's how:

#!/usr/bin/env bash

# Enable local clusters on Windows and macOS
export RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1

cleanup() {
  ray stop %3E/dev/null 2>&1 || true
  echo "Cluster stopped. Done."
}
trap cleanup EXIT

echo "Starting head node..."
ray start --head --port=6379 --ray-client-server-port=10001 \
  --node-manager-port=63000 --object-manager-port=63001 \
  --min-worker-port=30000 --max-worker-port=30099 --num-cpus=0 \
  --temp-dir=/tmp/ray/head-node
echo "Head node started"

echo "Starting worker node A..."
ray start --address=127.0.0.1:6379 \
  --node-manager-port=63010 --object-manager-port=63011 \
  --min-worker-port=30100 --max-worker-port=30199 --num-cpus=1
echo "Worker node A started"

echo "Starting worker node B..."
ray start --address=127.0.0.1:6379 \
  --node-manager-port=63020 --object-manager-port=63021 \
  --min-worker-port=30200 --max-worker-port=30299 --num-cpus=1
echo "Worker node B started"

echo "Cluster started!!"
echo

echo "Testing the cluster..."
python - <<'PY'
import ray
ray.init(address="127.0.0.1:6379")
print(ray.nodes())
PY

In this script, we set up a "head node" at port 6379, and then connected two "worker nodes" to that head node. The trick is to use different port ranges for each processes.