Skip to main content

What are child processes in Node.js?

Child processes let Node.js spawn separate OS processes to run shell commands, execute other programs, or offload CPU-heavy work outside the single-threaded event loop.

Theory

TL;DR

  • Think of child processes as separate workers: each has its own memory, CPU time, and event loop, while the main process stays free
  • Four methods: exec() for shell commands with buffered output, execFile() for direct binary execution, spawn() for streaming large data, fork() for Node-to-Node IPC messaging
  • The key split: exec() and execFile() collect all output before the callback fires; spawn() and fork() stream it
  • Decision rule: large or streaming data goes to spawn(), simple commands go to exec(), two Node.js processes talking go to fork()
  • Unlike worker threads, child processes have isolated memory and their own V8 instances

Quick example

js
const { spawn } = require('child_process'); // Streams output in real-time without loading it all into memory const child = spawn('ls', ['-la']); child.stdout.on('data', (data) => { console.log(`Output: ${data}`); }); child.on('close', (code) => { console.log(`Process exited with code ${code}`); }); // Event loop stays free while child runs console.log('Main thread not blocked');

spawn() returns a child process object where stdout, stderr, and stdin are streams. Data arrives as it is produced, not all at once after the process finishes.

Key difference

The child_process module breaks Node.js out of its single-threaded model by creating actual OS processes. Unlike worker threads, which share the same V8 heap, each child process gets its own V8 instance, memory heap, and event loop. Communication happens through IPC channels (for fork()) or stdin/stdout pipes. Objects passed via send() are serialized and deserialized. They are never shared by reference.

When to use

  • exec(): shell commands with small output (under roughly 1MB). Running git status, npm list, or any one-liner shell expression. Accepts a shell string, so pipes and redirects work.
  • execFile(): executing a binary or script directly without involving a shell. Safer than exec() because it does not parse shell metacharacters. Good for compiled Go binaries, Python scripts, or anything where user input touches the arguments.
  • spawn(): large output, real-time data, or long-running processes. ffmpeg video conversion, filtering log files, running build tools. Data flows as it arrives.
  • fork(): running Node.js code in a separate process with bidirectional messaging. CPU-heavy calculations, worker pools, or any case where you want to keep the main process responsive during serious compute work.

Comparison table

MethodShell?BufferingOutput sizeIPCBest for
exec()YesFull bufferSmall (<1MB)NoSimple shell commands
execFile()NoFull bufferSmall (<1MB)NoDirect execution, safer
spawn()NoStreamingUnlimitedNoLarge or real-time data
fork()NoStreamingUnlimitedYesNode-to-Node communication

How it works internally

When you call spawn(), Node.js uses the OS system call fork() on Unix/macOS or CreateProcess() on Windows to create a new process. The parent gets three file descriptors connected to the child: stdin, stdout, and stderr. For fork() specifically, Node.js also opens an IPC channel using Unix domain sockets or named pipes, which is what enables child.send() and process.on('message'). Each child process boots its own V8 instance. That is why fork() carries roughly 30MB of overhead per child compared to about 2MB for a worker thread.

Common mistakes

Using exec() for large output:

js
// Wrong: buffers entire output, throws ERR_CHILD_PROCESS_STDIO_MAXBUFFER exec('cat huge-file.txt', (error, stdout) => { console.log(stdout); // whole file sits in memory }); // Right: stream it spawn('cat', ['huge-file.txt']).stdout.pipe(process.stdout);

Default maxBuffer is 1MB. You can raise it with { maxBuffer: 10 * 1024 * 1024 }, but switching to spawn() is the cleaner fix for genuinely large data.

Ignoring error and exit events:

js
// Wrong: child crashes without notice, parent keeps running with broken state const child = spawn('some-command'); child.stdout.on('data', (data) => console.log(data)); // Right: handle both events child.on('error', (err) => console.error('Failed to start:', err)); child.on('exit', (code, signal) => { if (code !== 0) console.error(`Exited with code ${code}`); });

Shell injection via exec():

js
// Wrong: userId = "123; rm -rf /" becomes a real shell command const userId = req.query.id; exec(`grep ${userId} /etc/passwd`, callback); // Right: execFile skips the shell entirely const { execFile } = require('child_process'); execFile('grep', [userId, '/etc/passwd'], callback);

execFile() passes arguments as an array directly to the OS, so shell metacharacters are never interpreted.

Assuming fork() shares memory with the parent:

js
// Wrong assumption: modifying data in the child does not update the parent const sharedData = { count: 0 }; child.send(sharedData); // child modifies count, parent sees nothing // Right: return updated state through message passing child.on('message', (updatedData) => { console.log('Parent received:', updatedData); });

Objects are serialized with JSON.stringify when passed through IPC. The child gets a copy, not a reference.

Leaving orphan processes after the parent exits:

js
// Wrong: child keeps running after parent crashes const child = spawn('long-running-process'); // Right: clean up on exit process.on('exit', () => child.kill()); // If you want the child to outlive the parent intentionally: const daemon = spawn('process', [], { detached: true }); daemon.unref(); // parent will not wait for it

I ran into this with a CLI tool that read binary file metadata via exec(). It worked fine in development, then crashed in production whenever files exceeded 1MB. Switching to spawn() with piped output took about ten minutes and fixed it for good.

Real-world usage

  • Node.js Cluster module: uses fork() to spawn one worker per CPU core for HTTP load balancing
  • Jest and Mocha: run each test suite in a forked process so memory leaks in one suite do not affect others
  • Webpack and Vite: spawn child processes for compilation steps to keep the file watcher responsive
  • npm and yarn: use spawn() internally when you run npm run build to execute the build script
  • Piscina and similar worker pool libraries: use fork() under the hood to maintain a pool of reusable processes for CPU-intensive tasks

Follow-up questions

Q: What is the difference between spawn() and fork()?
A: spawn() launches any OS process (shell command, binary, Python script) with streaming I/O. fork() specifically launches a Node.js file and adds an IPC channel for bidirectional messaging via send() and on('message'). You cannot use send() with a process started by spawn().

Q: Why does exec() have a maxBuffer limit and how do you work around it?
A: exec() collects all stdout and stderr in memory before calling the callback. The default cap is 1MB. Pass { maxBuffer: N } to increase it, or switch to spawn() for anything that might produce more than a few hundred kilobytes.

Q: How do you add a timeout to a child process?
A: Pass { timeout: 5000 } to spawn() or exec() to kill the process after 5 seconds. Or manually: setTimeout(() => child.kill(), 5000). Either way, listen to the exit event to confirm the process actually stopped.

Q: Can a child process outlive its parent?
A: Yes. Spawn with { detached: true } and call child.unref(). The child becomes its own process group leader and keeps running after the parent exits. This is how you create background daemons from a Node.js script.

Q: What is the performance difference between fork() and worker threads?
A: fork() creates a separate OS process with its own V8 instance, around 30MB overhead per process. Worker threads share the same process and V8 heap, closer to 2MB per thread. For true parallelism across CPU cores both work. For shared memory via SharedArrayBuffer, only worker threads apply.

Q (Senior): How would you implement a worker pool with fork() and what edge cases would you handle?
A: Create an array of forked processes, maintain a task queue, and assign work round-robin or by availability. The real complexity is in edge cases: a child crashing mid-task (restart it and requeue), task timeouts (kill the child and retry), memory leaks in long-running children (restart after N tasks), and IPC message ordering (add correlation IDs to requests so responses match the right caller). Libraries like Piscina handle all of this. Rolling your own is a good learning exercise but not something to put in production without thorough testing.

Examples

Basic: shell command with exec()

js
const { exec } = require('child_process'); const { promisify } = require('util'); const execAsync = promisify(exec); async function getInstalledPackages() { try { const { stdout } = await execAsync('npm list --depth=0'); return stdout; } catch (err) { console.error('npm list failed:', err.message); return null; } }

promisify(exec) wraps the callback API into a Promise. The entire output arrives at once in stdout because exec() buffers it. For npm list that is fine since the output is small.

Intermediate: streaming a large log file with spawn()

js
const { spawn } = require('child_process'); const fs = require('fs'); // Filter error lines from a large log without loading the file into memory const grep = spawn('grep', ['ERROR', '/var/log/app.log']); const output = fs.createWriteStream('errors.txt'); grep.stdout.pipe(output); grep.on('error', (err) => console.error('grep failed to start:', err)); grep.on('close', (code) => { if (code === 0) { console.log('Done, errors.txt written'); } else { console.error(`grep exited with code ${code}`); } }); // Runs immediately, event loop is not blocked console.log('Filtering started...');

pipe() connects the child's stdout stream directly to a writable file stream with no intermediate memory buffer. Gigabytes of logs can pass through while the event loop stays free.

Advanced: fork() with bidirectional messaging and error handling

js
// worker.js - runs in its own process process.on('message', (msg) => { if (msg.cmd === 'sum') { try { const result = msg.data.reduce((a, b) => a + b, 0); process.send({ id: msg.id, result }); } catch (err) { process.send({ id: msg.id, error: err.message }); } } });
js
// parent.js const { fork } = require('child_process'); const path = require('path'); const child = fork(path.join(__dirname, 'worker.js')); let messageId = 0; const pending = new Map(); function calculate(data) { return new Promise((resolve, reject) => { const id = ++messageId; pending.set(id, { resolve, reject }); child.send({ id, cmd: 'sum', data }); }); } child.on('message', (msg) => { const handler = pending.get(msg.id); if (!handler) return; pending.delete(msg.id); msg.error ? handler.reject(new Error(msg.error)) : handler.resolve(msg.result); }); child.on('error', (err) => console.error('Worker failed to start:', err)); process.on('exit', () => child.kill()); calculate([1, 2, 3, 4, 5]).then((result) => { console.log('Sum:', result); // Sum: 15 child.kill(); });

The id field on each message is the correlation key. Without it, two concurrent calculate() calls would receive each other's responses. This pattern is the foundation of any production worker pool.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?