What is the cluster module in Node.js?

Node.js~5 min read

The cluster module lets a Node.js app spawn multiple worker processes that share the same TCP port, one per CPU core. By default, a Node.js process uses one core regardless of how many your machine has. Cluster fixes that.

Theory

TL;DR

Analogy: one manager (primary) runs the front door, multiple cooks (workers) serve from the same address using separate stoves (CPU cores)
Default Node.js = 1 core; cluster = all cores, no external proxy needed
The OS distributes incoming connections across workers automatically via round-robin
Use it when CPU load stays above ~20% and you have more than 2 cores
Skip it for I/O-heavy apps (DB queries, external APIs) - async handles those without forking

Quick example

const cluster = require('cluster');
const http = require('http');
const os = require('os');

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) {
    cluster.fork(); // one worker per CPU core
  }
  cluster.on('exit', () => cluster.fork()); // restart crashed workers
} else {
  // each worker listens on the same port
  http.createServer((req, res) => {
    res.end(`Worker ${process.pid}\n`);
  }).listen(8000);
}

On a 4-core machine, four separate processes handle port 8000. Run curl localhost:8000 ten times and you see different PIDs. That's the OS distributing load, nothing else.

How it works internally

When the primary calls cluster.fork(), V8 uses the clone() syscall to copy the current process. The primary binds the listening socket once with SO_REUSEPORT (Linux 3.9+). Workers inherit the file descriptor and block on accept() until the kernel hands them a connection. No proxy sits between the client and the worker. The kernel handles the round-robin.

IPC between primary and workers runs over Unix domain sockets. That's how signals, exit events, and custom messages via process.send() travel between processes.

When to use cluster

CPU-bound work (above ~20% load): forking across cores helps. A loop computing 10^8 iterations blocks one thread; spread it across 4 workers and you handle 4x the requests in parallel.
I/O-heavy apps: skip it. Async/await plus the event loop scales DB and API calls without any forking overhead.
Less than 2 cores: the process startup cost cancels any gain. Containers are the case I see most often - someone runs cluster inside a Docker container with 1 CPU and wonders why nothing improved.
Docker or Kubernetes: orchestrate replicas at the container level instead. Cluster inside a single-core container adds nothing.
Zero-downtime deploys: cluster handles this well if you restart workers one at a time.

Graceful shutdown

The most common production problem is dropping in-flight requests when a worker exits. Without server.close(), connections get cut mid-response.

if (!cluster.isPrimary) {
  const server = http.createServer(handler).listen(3000);

  process.on('SIGTERM', () => {
    server.close(() => process.exit(0)); // drain in-flight requests first
    setTimeout(() => process.exit(1), 30_000); // force-kill after 30s
  });
}

if (cluster.isPrimary) {
  process.on('SIGTERM', async () => {
    for (const id in cluster.workers) {
      cluster.workers[id].kill('SIGTERM');
    }
    await new Promise(r => setTimeout(r, 30_000));
    process.exit(0);
  });
}

Kubernetes sends SIGTERM on pod shutdown. Without this pattern, every rolling deploy drops some requests.

Common mistakes

1. Hardcoding worker count

// wrong
for (let i = 0; i < 4; i++) cluster.fork();

On a 64-core server this under-forks; on a 2-core container it wastes memory. Always use os.cpus().length.

2. Sharing state via global variables

// wrong - each worker has its own copy of counter
let counter = 0;
http.createServer((req, res) => {
  counter++;
  res.end(counter.toString()); // each worker returns 1, 2, 3 independently
}).listen(3000);

Workers are separate processes with separate V8 heaps. A counter in worker 1 never reaches worker 2. Use Redis (redis.incr('counter')) or send a message to the primary for shared state.

3. No exit handler

A worker crash on a quad-core cluster silently drops you to 75% capacity. Add cluster.on('exit', () => cluster.fork()) and the primary always keeps the right number of workers running.

4. Listening in the primary

// wrong - primary handles all traffic, workers sit idle
if (cluster.isPrimary) {
  http.createServer(handler).listen(8000);
}

Listening belongs in the worker branch. The primary manages lifecycle only.

5. Session state without Redis

req.session.user stored in memory lives inside one worker. A second request hitting a different worker finds no session. Fix: use a Redis session store or configure sticky sessions at the proxy level.

Real-world usage

PM2: wraps cluster automatically. pm2 start app.js -i max forks one worker per core. Most Node.js production setups use PM2 rather than raw cluster.
Express APIs: wrap app.listen() in the worker branch; the rest of the app setup stays the same.
Manual cluster vs PM2: use raw cluster when you want zero dependencies or need custom restart logic. Use PM2 for monitoring dashboards, log aggregation, and automatic restarts.

Follow-up questions

Q: How does load balancing work without a proxy?
A: The primary binds the port with SO_REUSEPORT. The kernel round-robins new connections to workers blocked on accept(). No external process is involved.

Q: What is the difference between cluster and child_process.fork()?
A: child_process.fork() creates a subprocess with IPC but no port sharing. Cluster adds socket inheritance so all workers can accept on the same port.

Q: How do you handle WebSockets with cluster?
A: WebSockets need sticky sessions because the connection persists across multiple messages. Use a load balancer like HAProxy that routes by source IP, or encode the worker ID in the connection URL.

Q: Why can cluster hurt performance on I/O-heavy apps?
A: Fork overhead plus IPC cost adds up. A single async Node.js process handles thousands of concurrent DB queries through the event loop. Forking just adds memory and startup cost with no throughput gain.

Q: How would you implement zero-downtime deploys manually?
A: Restart workers one at a time. Send SIGTERM to one worker, wait for it to drain via server.close(), then fork a replacement. PM2's pm2 reload automates exactly this sequence.

Q: Round-robin vs random distribution?
A: Node.js uses round-robin by default on Linux. You can disable it with CLUSTER_ROUND_ROBIN=false, but round-robin distributes CPU load more evenly for long-poll connections. Measure with ab -n 1000 -c 10 if you want to compare both approaches on your hardware.

Examples

Basic HTTP server with auto-restart

const cluster = require('cluster');
const http = require('http');
const os = require('os');

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting`);

  for (let i = 0; i < os.cpus().length; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end(`Served by worker ${process.pid}\n`);
  }).listen(8000);

  console.log(`Worker ${process.pid} listening on 8000`);
}

This is the minimal production pattern. Primary forks and restarts dead workers, never touches HTTP. Workers handle everything else.

Express API across all CPU cores

const cluster = require('cluster');
const os = require('os');
const express = require('express');

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) cluster.fork();
  cluster.on('exit', () => cluster.fork());
} else {
  const app = express();

  app.get('/compute/:n', (req, res) => {
    // CPU-bound work - this is where cluster helps
    let sum = 0;
    for (let j = 0; j < 1e8; j++) sum += j;
    res.json({ n: req.params.n, pid: process.pid, sum });
  });

  app.listen(3000, () => console.log(`Worker ${process.pid} ready`));
}

Ten concurrent requests to /compute/1 hit different workers on a quad-core machine. Without cluster they queue behind each other on one core. That's the whole point.

IPC: worker-to-primary messaging

if (cluster.isPrimary) {
  const worker = cluster.fork();

  worker.on('message', (msg) => {
    if (msg.type === 'metrics') {
      console.log(`Worker ${worker.process.pid} handled ${msg.count} requests`);
    }
  });
} else {
  let count = 0;

  http.createServer((req, res) => {
    count++;
    res.end('ok');
  }).listen(3000);

  // report metrics to primary every 5 seconds
  setInterval(() => {
    process.send({ type: 'metrics', count });
  }, 5000);
}

This pattern lets the primary aggregate stats from all workers without shared memory.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?