What are child processes in Node.js?
Child processes let Node.js spawn separate OS processes to run shell commands, execute other programs, or offload CPU-heavy work outside the single-threaded event loop.
Theory
TL;DR
- Think of child processes as separate workers: each has its own memory, CPU time, and event loop, while the main process stays free
- Four methods:
exec()for shell commands with buffered output,execFile()for direct binary execution,spawn()for streaming large data,fork()for Node-to-Node IPC messaging - The key split:
exec()andexecFile()collect all output before the callback fires;spawn()andfork()stream it - Decision rule: large or streaming data goes to
spawn(), simple commands go toexec(), two Node.js processes talking go tofork() - Unlike worker threads, child processes have isolated memory and their own V8 instances
Quick example
const { spawn } = require('child_process');
// Streams output in real-time without loading it all into memory
const child = spawn('ls', ['-la']);
child.stdout.on('data', (data) => {
console.log(`Output: ${data}`);
});
child.on('close', (code) => {
console.log(`Process exited with code ${code}`);
});
// Event loop stays free while child runs
console.log('Main thread not blocked');spawn() returns a child process object where stdout, stderr, and stdin are streams. Data arrives as it is produced, not all at once after the process finishes.
Key difference
The child_process module breaks Node.js out of its single-threaded model by creating actual OS processes. Unlike worker threads, which share the same V8 heap, each child process gets its own V8 instance, memory heap, and event loop. Communication happens through IPC channels (for fork()) or stdin/stdout pipes. Objects passed via send() are serialized and deserialized. They are never shared by reference.
When to use
exec(): shell commands with small output (under roughly 1MB). Runninggit status,npm list, or any one-liner shell expression. Accepts a shell string, so pipes and redirects work.execFile(): executing a binary or script directly without involving a shell. Safer thanexec()because it does not parse shell metacharacters. Good for compiled Go binaries, Python scripts, or anything where user input touches the arguments.spawn(): large output, real-time data, or long-running processes.ffmpegvideo conversion, filtering log files, running build tools. Data flows as it arrives.fork(): running Node.js code in a separate process with bidirectional messaging. CPU-heavy calculations, worker pools, or any case where you want to keep the main process responsive during serious compute work.
Comparison table
| Method | Shell? | Buffering | Output size | IPC | Best for |
|---|---|---|---|---|---|
exec() | Yes | Full buffer | Small (<1MB) | No | Simple shell commands |
execFile() | No | Full buffer | Small (<1MB) | No | Direct execution, safer |
spawn() | No | Streaming | Unlimited | No | Large or real-time data |
fork() | No | Streaming | Unlimited | Yes | Node-to-Node communication |
How it works internally
When you call spawn(), Node.js uses the OS system call fork() on Unix/macOS or CreateProcess() on Windows to create a new process. The parent gets three file descriptors connected to the child: stdin, stdout, and stderr. For fork() specifically, Node.js also opens an IPC channel using Unix domain sockets or named pipes, which is what enables child.send() and process.on('message'). Each child process boots its own V8 instance. That is why fork() carries roughly 30MB of overhead per child compared to about 2MB for a worker thread.
Common mistakes
Using exec() for large output:
// Wrong: buffers entire output, throws ERR_CHILD_PROCESS_STDIO_MAXBUFFER
exec('cat huge-file.txt', (error, stdout) => {
console.log(stdout); // whole file sits in memory
});
// Right: stream it
spawn('cat', ['huge-file.txt']).stdout.pipe(process.stdout);Default maxBuffer is 1MB. You can raise it with { maxBuffer: 10 * 1024 * 1024 }, but switching to spawn() is the cleaner fix for genuinely large data.
Ignoring error and exit events:
// Wrong: child crashes without notice, parent keeps running with broken state
const child = spawn('some-command');
child.stdout.on('data', (data) => console.log(data));
// Right: handle both events
child.on('error', (err) => console.error('Failed to start:', err));
child.on('exit', (code, signal) => {
if (code !== 0) console.error(`Exited with code ${code}`);
});Shell injection via exec():
// Wrong: userId = "123; rm -rf /" becomes a real shell command
const userId = req.query.id;
exec(`grep ${userId} /etc/passwd`, callback);
// Right: execFile skips the shell entirely
const { execFile } = require('child_process');
execFile('grep', [userId, '/etc/passwd'], callback);execFile() passes arguments as an array directly to the OS, so shell metacharacters are never interpreted.
Assuming fork() shares memory with the parent:
// Wrong assumption: modifying data in the child does not update the parent
const sharedData = { count: 0 };
child.send(sharedData);
// child modifies count, parent sees nothing
// Right: return updated state through message passing
child.on('message', (updatedData) => {
console.log('Parent received:', updatedData);
});Objects are serialized with JSON.stringify when passed through IPC. The child gets a copy, not a reference.
Leaving orphan processes after the parent exits:
// Wrong: child keeps running after parent crashes
const child = spawn('long-running-process');
// Right: clean up on exit
process.on('exit', () => child.kill());
// If you want the child to outlive the parent intentionally:
const daemon = spawn('process', [], { detached: true });
daemon.unref(); // parent will not wait for itI ran into this with a CLI tool that read binary file metadata via exec(). It worked fine in development, then crashed in production whenever files exceeded 1MB. Switching to spawn() with piped output took about ten minutes and fixed it for good.
Real-world usage
- Node.js Cluster module: uses
fork()to spawn one worker per CPU core for HTTP load balancing - Jest and Mocha: run each test suite in a forked process so memory leaks in one suite do not affect others
- Webpack and Vite: spawn child processes for compilation steps to keep the file watcher responsive
- npm and yarn: use
spawn()internally when you runnpm run buildto execute the build script - Piscina and similar worker pool libraries: use
fork()under the hood to maintain a pool of reusable processes for CPU-intensive tasks
Follow-up questions
Q: What is the difference between spawn() and fork()?
A: spawn() launches any OS process (shell command, binary, Python script) with streaming I/O. fork() specifically launches a Node.js file and adds an IPC channel for bidirectional messaging via send() and on('message'). You cannot use send() with a process started by spawn().
Q: Why does exec() have a maxBuffer limit and how do you work around it?
A: exec() collects all stdout and stderr in memory before calling the callback. The default cap is 1MB. Pass { maxBuffer: N } to increase it, or switch to spawn() for anything that might produce more than a few hundred kilobytes.
Q: How do you add a timeout to a child process?
A: Pass { timeout: 5000 } to spawn() or exec() to kill the process after 5 seconds. Or manually: setTimeout(() => child.kill(), 5000). Either way, listen to the exit event to confirm the process actually stopped.
Q: Can a child process outlive its parent?
A: Yes. Spawn with { detached: true } and call child.unref(). The child becomes its own process group leader and keeps running after the parent exits. This is how you create background daemons from a Node.js script.
Q: What is the performance difference between fork() and worker threads?
A: fork() creates a separate OS process with its own V8 instance, around 30MB overhead per process. Worker threads share the same process and V8 heap, closer to 2MB per thread. For true parallelism across CPU cores both work. For shared memory via SharedArrayBuffer, only worker threads apply.
Q (Senior): How would you implement a worker pool with fork() and what edge cases would you handle?
A: Create an array of forked processes, maintain a task queue, and assign work round-robin or by availability. The real complexity is in edge cases: a child crashing mid-task (restart it and requeue), task timeouts (kill the child and retry), memory leaks in long-running children (restart after N tasks), and IPC message ordering (add correlation IDs to requests so responses match the right caller). Libraries like Piscina handle all of this. Rolling your own is a good learning exercise but not something to put in production without thorough testing.
Examples
Basic: shell command with exec()
const { exec } = require('child_process');
const { promisify } = require('util');
const execAsync = promisify(exec);
async function getInstalledPackages() {
try {
const { stdout } = await execAsync('npm list --depth=0');
return stdout;
} catch (err) {
console.error('npm list failed:', err.message);
return null;
}
}promisify(exec) wraps the callback API into a Promise. The entire output arrives at once in stdout because exec() buffers it. For npm list that is fine since the output is small.
Intermediate: streaming a large log file with spawn()
const { spawn } = require('child_process');
const fs = require('fs');
// Filter error lines from a large log without loading the file into memory
const grep = spawn('grep', ['ERROR', '/var/log/app.log']);
const output = fs.createWriteStream('errors.txt');
grep.stdout.pipe(output);
grep.on('error', (err) => console.error('grep failed to start:', err));
grep.on('close', (code) => {
if (code === 0) {
console.log('Done, errors.txt written');
} else {
console.error(`grep exited with code ${code}`);
}
});
// Runs immediately, event loop is not blocked
console.log('Filtering started...');pipe() connects the child's stdout stream directly to a writable file stream with no intermediate memory buffer. Gigabytes of logs can pass through while the event loop stays free.
Advanced: fork() with bidirectional messaging and error handling
// worker.js - runs in its own process
process.on('message', (msg) => {
if (msg.cmd === 'sum') {
try {
const result = msg.data.reduce((a, b) => a + b, 0);
process.send({ id: msg.id, result });
} catch (err) {
process.send({ id: msg.id, error: err.message });
}
}
});// parent.js
const { fork } = require('child_process');
const path = require('path');
const child = fork(path.join(__dirname, 'worker.js'));
let messageId = 0;
const pending = new Map();
function calculate(data) {
return new Promise((resolve, reject) => {
const id = ++messageId;
pending.set(id, { resolve, reject });
child.send({ id, cmd: 'sum', data });
});
}
child.on('message', (msg) => {
const handler = pending.get(msg.id);
if (!handler) return;
pending.delete(msg.id);
msg.error
? handler.reject(new Error(msg.error))
: handler.resolve(msg.result);
});
child.on('error', (err) => console.error('Worker failed to start:', err));
process.on('exit', () => child.kill());
calculate([1, 2, 3, 4, 5]).then((result) => {
console.log('Sum:', result); // Sum: 15
child.kill();
});The id field on each message is the correlation key. Without it, two concurrent calculate() calls would receive each other's responses. This pattern is the foundation of any production worker pool.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.