Suggest an edit

Improve this article

Refine the answer for “What is PM2 and how to manage Node.js processes in production?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**PM2** is a production process manager for Node.js that automatically restarts crashed apps, distributes load across CPU cores via clustering, and persists logs to files.

## Theory

### TL;DR

- PM2 is like a restaurant manager who replaces any waiter who quits mid-shift, opens more stations during rush hour, and logs every order without closing the dining room.
- Main difference: `node server.js` dies on crash and uses one CPU core. PM2 restarts automatically, clusters across all cores, and monitors metrics in real time.
- Use PM2 when deploying to a VPS or bare server. For local dev, use nodemon. For serverless (Lambda, Vercel), the platform manages processes itself.
- `pm2 reload` and `pm2 restart` are not the same thing. One is graceful, one causes downtime.

### Quick example

```bash
# Without PM2 - one crash kills everything
node server.js  # Unhandled error → process dies → manual restart required

# With PM2 - automatic recovery
npm install -g pm2
pm2 start server.js --name api -i max  # cluster mode across all CPU cores
pm2 list                               # online | uptime | restarts: 0
# Simulate a crash: kill the worker process
# PM2 detects the exit signal, restarts within 1 second
# pm2 list now shows: restarts: 1
pm2 stop api
pm2 delete api
```

One command replaces an entire startup script plus manual monitoring.

### Key difference

Running `node server.js` ties your app to a single OS process. Any unhandled exception kills it permanently, all traffic goes through one CPU core, and stdout logs disappear on restart. PM2 wraps that process in a supervisor: it catches the exit signal, spawns a replacement within milliseconds, and routes traffic across multiple worker instances using Node's built-in `cluster` module. The app becomes a service, not a script.

### When to use

- Single server, Express or Fastify API: `pm2 start server.js -i max` adds clustering immediately.
- Self-hosted Next.js: `pm2 start npm --name "next" -- start` with a custom server.
- NestJS or compiled TypeScript backends: ecosystem file pointing to `dist/server.js`.
- High-traffic app behind Nginx: PM2 handles process supervision, Nginx handles routing.
- Local dev: use nodemon instead - it handles hot reload better for development.
- Serverless (Lambda, Vercel, Fly.io): the platform manages processes, PM2 adds nothing useful.

### Comparison table

| Feature | `node app.js` | PM2 | nodemon | forever |
|---------|---------------|-----|---------|--------|
| Auto-restart on crash | No | Yes | Yes (dev only) | Yes |
| CPU clustering | Manual `cluster` module | Built-in (`-i max`) | No | No |
| Log persistence | stdout, lost on restart | Rotated files in `~/.pm2/logs/` | Console | Files |
| Zero-downtime reload | Manual | `pm2 reload` | No | No |
| Monitoring | None | `pm2 monit` + cloud dashboard | None | Basic |
| When to use | Scripts, local dev | Production Node servers | Dev hot-reload | Simple restarts (legacy) |

### How PM2 works internally

PM2 runs as a Node.js master process that forks child processes via OS-level `fork()` calls, managed through Node's `child_process` module. It listens to each child's exit codes and process signals (SIGINT, uncaught exceptions) and triggers a restart within milliseconds when the exit code is non-zero. Clustering delegates to Node's built-in `cluster` module, with one worker per CPU core calculated via `os.cpus().length`.

The `pm2 reload` command works by spawning new workers first, waiting for each to emit the "listening" event (meaning the HTTP server is ready), then sending SIGTERM to old workers and waiting for open connections to close. That sequence is what makes zero-downtime actually work.

### Ecosystem file

For anything beyond a quick start, use an ecosystem file:

```js
// ecosystem.config.js
module.exports = {
  apps: [{
    name: 'api',
    script: 'dist/server.js',   // compiled TypeScript output
    instances: 'max',           // one instance per CPU core
    exec_mode: 'cluster',       // required - without this, instances is ignored
    max_memory_restart: '1G',   // restart worker if it exceeds 1GB RAM
    max_restarts: 5,            // stop retrying after 5 crashes in 60s
    kill_timeout: 5000,         // give workers 5s to drain before SIGKILL
    env_production: {
      NODE_ENV: 'production',
      PORT: 3000
    }
  }]
};
```

```bash
pm2 start ecosystem.config.js --env production
pm2 reload api  # new workers start, old ones finish their requests, then exit
pm2 save        # persist process list across server reboots
pm2 startup     # generate systemd unit file for auto-start on boot
```

### Common mistakes

**Starting without a name**
`pm2 start app.js` without `--name` creates an entry called "app" or "server". With multiple services, `pm2 list` fills up with identically named entries you cannot target individually for stop or reload. Always add `--name myapp`.

**Forgetting `exec_mode: 'cluster'` in the ecosystem file**
Setting `instances: 'max'` without `exec_mode: 'cluster'` runs a single instance in fork mode. The multi-instance configuration is silently ignored. An 8-core server ends up running one Node.js thread. This explains roughly half of the PM2 performance complaints on Stack Overflow and Reddit.

**Using `pm2 restart` in production deploys**
`pm2 restart` kills all workers at once. Active connections drop and return 5xx errors. `pm2 reload` replaces workers one by one, waiting for each to drain. Always use `pm2 reload` in CI/CD pipelines.

**Running PM2 as root**
Child processes inherit root permissions. If your app ever runs shell commands, that is a real attack surface. Use a non-root system user and let `pm2 startup` generate the correct systemd configuration for boot persistence.

**Skipping log rotation**
I have seen this take down a production server at 3am - logs grow to 100GB and fill the disk. Install `pm2-logrotate` on day one: `pm2 install pm2-logrotate`. It rotates at 10MB by default.

### Real-world usage

- Ghost blog, Strapi CMS: `pm2 start ecosystem.config.js` for clustered API routes.
- Self-hosted Next.js: `pm2 start npm --name "next" -- start`.
- NestJS backends: ecosystem file with `max_memory_restart: '1G'` and compiled dist output.
- Feathers.js real-time apps: `-i max` for Socket.io worker scaling across cores.
- PM2 inside Docker: use `pm2-runtime` as the entrypoint to handle PID 1 correctly and avoid zombie process accumulation.

### Follow-up questions

**Q:** How does PM2 implement zero-downtime reload exactly?
**A:** It spawns new cluster workers, waits for each to emit the "listening" event (HTTP server ready to accept connections), then sends SIGTERM to old workers and waits for open connections to close before terminating them.

**Q:** What is the difference between `pm2 start -i max` and writing the cluster module yourself?
**A:** PM2 adds automatic per-worker restart, log persistence, and a monitoring layer on top of Node's cluster. If one worker crashes, PM2 restarts that specific worker without touching the others.

**Q:** What happens when a worker exceeds the memory limit?
**A:** PM2 polls the V8 heap size and compares it against `max_memory_restart`. When the limit is exceeded, it restarts that specific worker while others keep serving traffic.

**Q:** What is the correct way to run PM2 inside Docker?
**A:** Use `pm2-runtime` instead of plain `pm2 start`. It handles PID 1 signal forwarding correctly and prevents zombie process accumulation that plain PM2 misses in a container context.

**Q:** Senior-level: how does PM2 distinguish a crash from a graceful stop?
**A:** It listens on `child.on('exit')` and checks the exit code together with whether PM2 itself sent SIGTERM (from `pm2 stop`). A non-zero exit code without a prior SIGTERM from PM2 means crash and triggers a restart. After `max_restarts` attempts within the window, the app moves to "errored" state and PM2 stops retrying.

## Examples

### Basic: Express API with auto-restart

```javascript
// server.js
const express = require('express');
const app = express();

app.get('/', (req, res) => res.send('Hello from PM2'));

app.listen(3000, () => console.log('Running on port 3000'));
```

```bash
pm2 start server.js --name basic-api -i 2
pm2 list
# basic-api | cluster | 2 instances | online | restarts: 0
```

Kill one of the worker processes manually. PM2 detects the exit and spawns a replacement. The other worker continues serving requests during the recovery window.

### Intermediate: Production ecosystem file (NestJS / TypeScript)

```js
// ecosystem.config.js - used in NestJS and Strapi production deploys
module.exports = {
  apps: [{
    name: 'api',
    script: 'dist/main.js',
    instances: 'max',
    exec_mode: 'cluster',
    max_memory_restart: '1G',
    max_restarts: 5,
    kill_timeout: 5000,
    env_production: {
      NODE_ENV: 'production',
      PORT: 3000
    }
  }]
};
```

```bash
pm2 start ecosystem.config.js --env production
pm2 reload api
# During reload: no 5xx errors - new workers accept before old ones exit
pm2 logs api
pm2 save && pm2 startup
```

The `pm2 save` and `pm2 startup` combination persists the process list across server reboots, so nothing needs to be restarted manually after a machine restart.

### Advanced: Crash loop protection

Without limits, a bug that crashes the app immediately after startup causes PM2 to restart it in an infinite loop, burning CPU and flooding logs.

```js
// Add to the apps[] entry in ecosystem.config.js
{
  max_restarts: 5,    // give up after 5 crashes
  min_uptime: '10s',  // app must stay up 10s to count as a successful start
  kill_timeout: 5000  // 5s grace period before SIGKILL
}
```

```bash
pm2 start ecosystem.config.js
# App crashes 5 times, each time under 10s uptime
pm2 list
# Status: errored - PM2 stopped retrying
pm2 logs api --lines 50  # inspect the crash reason
```

After 5 restarts, PM2 marks the app as "errored" and stops. Fix the bug, run `pm2 restart api`, and the counter resets. No more CPU spikes from infinite restart loops at 3am.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1562 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.