How to detect and prevent memory leaks in Node.js?

Node.js~5 min read

Memory leaks in Node.js happen when objects stay referenced past their useful life, blocking V8's garbage collector from reclaiming that memory. Heap usage grows steadily. Eventually the process crashes with an out-of-memory (OOM) error.

Theory

TL;DR

V8 traces from roots (globals, stack, closures) and frees anything unreachable. A leak is a live root holding an object that should be dead.
Analogy: a restaurant kitchen where returned plates still have an "in use" tag. The dishwasher cannot clean what looks occupied.
Four common causes: unbounded caches, forgotten event listeners, closures holding large objects, timers never cleared.
Detect with process.memoryUsage() trending, --inspect + Chrome DevTools heap snapshots, or clinic.js.
Fix rule: bound everything, remove listeners on close, monitor heap in production.

Quick example

// Leak: global cache grows on every tick
let cache = {};

setInterval(() => {
  cache[Date.now()] = new Array(1_000_000).fill('leak'); // ~8MB per second
  console.log(Object.keys(cache).length); // 1, 2, 3... never shrinks
}, 1000);
// Heap hits 1GB+ in minutes, then OOM crash

// Fix: bounded LRU cache
const { LRUCache } = require('lru-cache');
const bounded = new LRUCache({ max: 100, ttl: 1000 * 60 * 5 });
bounded.set(Date.now(), 'data');
// Size stays at 100 entries max; old entries are auto-evicted

The plain object holds references indefinitely. V8 sees the global cache as a live root and never collects its entries.

How V8 GC relates to leaks

V8 uses generational garbage collection. Short-lived objects live in the young generation and get cleared by Scavenge very fast. Long-lived objects get promoted to the old generation and freed by mark-sweep-compact. A leak happens when an object gets promoted to old gen via a retained root - a global variable, an active closure, a pending timer callback - and stays there.

heapUsed in process.memoryUsage() tracks how much of that old gen is occupied. Steady growth over hours with no traffic increase is your leak signal.

Four common causes

1. Unbounded global caches. A plain object or Map used as a cache with no size limit grows forever. Every unique key adds an entry. After 10,000 unique user IDs, your cache is 500MB+ and climbing.

2. Unremoved event listeners. Attaching a listener inside a request handler without removing it on close means each request adds a permanent listener. Node.js warns at 11 listeners on the same emitter, but the damage is already happening.

// ❌ Listener added per request, never removed
app.get('/stream', (req, res) => {
  process.on('data', handler);
});

// ✅ Remove when the connection closes
app.get('/stream', (req, res) => {
  process.on('data', handler);
  req.on('close', () => process.removeListener('data', handler));
});

3. Closures holding large objects. A closure keeps the entire scope it was created in. If that scope contains a 10MB array and you only need .length, you are keeping 10MB alive for the lifetime of that function.

4. Timers that never clear. setInterval holds a reference to its callback and all variables in its closure. If you never call clearInterval, the timer and everything it touches lives forever.

When to investigate

Long-running API servers and WebSocket servers are the main targets. A CLI script that exits in 2 seconds does not matter. But an Express app handling millions of requests over days will hit OOM if any of these patterns exist.

I have seen servers start at 150MB and climb to 2GB over 72 hours with no traffic change. That is always a leak, not a scaling problem.

How to detect memory leaks

Step 1: Log process.memoryUsage() over time.

setInterval(() => {
  const { heapUsed, heapTotal, rss } = process.memoryUsage();
  console.log({
    heapUsed: `${Math.round(heapUsed / 1024 / 1024)}MB`,
    heapTotal: `${Math.round(heapTotal / 1024 / 1024)}MB`,
    rss: `${Math.round(rss / 1024 / 1024)}MB`,
  });
}, 10_000);

Steady growth over 30+ minutes under constant load is a strong signal.

Step 2: Heap snapshots with Chrome DevTools.

bash

node --inspect server.js

Open chrome://inspect, take a snapshot before load, run a load test, take another snapshot. Use the "Comparison" view. Growing constructor names in the dominators list are your leak.

Step 3: clinic.js for automated reports.

bash

npx clinic heapprofiler -- node server.js
npx clinic doctor -- node server.js

clinic.js generates a visual heap allocation report. Spikes that never drop are leaks. clinic doctor identifies the issue type automatically.

Production: track heapUsed via Prometheus or Datadog. Alert at 80% of --max-old-space-size before the OOM.

Common mistakes

Assuming GC will clean it up. GC cannot collect objects with live references. Under leak conditions it runs more often and achieves less, adding up to 10x CPU overhead before the process crashes.

Using --max-old-space-size as the fix.

bash

node --max-old-space-size=4096 server.js

This delays the OOM by a few hours. Use it to buy time for a proper fix, not as the solution. Set a sensible limit (e.g., 1GB for a typical API server) and alert on 80%.

WeakMap with primitive keys.

// ❌ TypeError: key must be an object
const wm = new WeakMap();
wm.set('user-123', data);

// ✅ Object key - auto-GC'd when the object is collected
wm.set(userObject, computedData);

Closures in hot loops capturing entire arrays.

// ❌ All 1M closures hold a reference to the full arr
const fns = [];
for (let i = 0; i < 1e6; i++) {
  fns.push(() => console.log(arr[i]));
}

// ✅ Each closure holds only one value
for (const item of arr) {
  fns.push(() => console.log(item));
}

Real-world usage

Express: lru-cache in route middleware with max: 500 and ttl: 300_000
Socket.io: socket.removeAllListeners('update') in the disconnect handler
Cluster workers: process.removeListener('message', handler) on worker disconnect
PM2: --heap-dump-on-oom flag captures a snapshot on crash for post-mortem analysis
Production alerts: Prometheus nodejs_heap_size_used_bytes metric, alert threshold at 80%

Follow-up questions

Q: How do you find a leak in production without downtime?
A: Enable --inspect with a port not exposed publicly, then SSH tunnel to it. Take two heap snapshots 10 minutes apart under live load. The "Comparison" view in Chrome DevTools shows net allocations. Growing constructor names with high retained size are the source.

Q: What is the difference between heapUsed and rss in process.memoryUsage()?
A: heapUsed is the JS heap V8 manages. rss (Resident Set Size) is total memory the OS allocated to the process, including native buffers, C++ objects, and the stack. Growing rss with stable heapUsed often points to a native module or Buffer leak outside the JS heap.

Q: When does setInterval cause a leak vs when is it fine?
A: setInterval causes a leak when it references variables from an outer closure and is never cleared. An app-global interval that runs for the lifetime of the process is fine. An interval created per-request or per-connection without cleanup leaks the callback scope and all its references.

Q: How do WeakRef and WeakMap differ for caching?
A: WeakRef (Node 14+) holds a single object weakly. Call .deref() to check if it still exists before using it. WeakMap is for key-value caching keyed by objects. When the key object is collected, the entry disappears automatically.

Q: (Senior) In a WebSocket server with 10,000 concurrent connections, how would you bound per-client state without an O(n) cleanup pass?
A: Shard client state into N fixed WeakMaps (e.g., 10 maps, assigned by clientId % 10). Keys are socket objects, which are GC'd automatically on disconnect. No cleanup loop needed. Add a ping timeout: if a client does not respond within 30 seconds, close the socket and the WeakMap entry disappears on its own.

Examples

Basic: Express route with an unbounded cache

// ❌ Global object grows per unique user ID
if (!global.userCache) global.userCache = {};

app.get('/user/:id', (req, res) => {
  const id = req.params.id;
  if (!userCache[id]) userCache[id] = db.getUser(id);
  res.json(userCache[id]);
});
// After 10k unique users: heap 500MB+, OOM coming

// ✅ Bounded TTL cache
const NodeCache = require('node-cache');
const cache = new NodeCache({ stdTTL: 300, checkperiod: 60 });

app.get('/user/:id', async (req, res) => {
  const id = req.params.id;
  let user = cache.get(id);
  if (!user) {
    user = await db.getUser(id);
    cache.set(id, user);
  }
  res.json(user);
});
// Heap stays ~50MB regardless of unique user count

Entries expire after 5 minutes. The checkperiod option runs automatic cleanup every 60 seconds. No manual eviction needed.

Intermediate: Event listener leak in an SSE endpoint

// ❌ Listener count grows with each connected client
app.get('/events', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const send = (data) => res.write(`data: ${JSON.stringify(data)}\n\n`);
  eventBus.on('update', send);
  // No cleanup - listener stays after client disconnects
});

// ✅ Remove listener when the client disconnects
app.get('/events', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const send = (data) => res.write(`data: ${JSON.stringify(data)}\n\n`);
  eventBus.on('update', send);
  req.on('close', () => eventBus.removeListener('update', send));
});

req.on('close') fires on both normal disconnects and dropped connections. That covers all leak paths.

Advanced: Heap snapshot workflow for production diagnosis

// Add this behind an auth check - never expose publicly
const v8 = require('v8');

app.get('/internal/heapdump', (req, res) => {
  const filename = `/tmp/heap-${Date.now()}.heapsnapshot`;
  v8.writeHeapSnapshot(filename);
  res.json({ file: filename });
});

Call this endpoint twice: once before a 10-minute load period, once after. Download both .heapsnapshot files. Open Chrome DevTools, Memory tab, load both files. Switch to "Comparison" view and sort by "Delta". Anything with a large positive delta in the constructor list is a leak candidate. The "Retainers" panel traces the reference chain back to the root keeping it alive.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?