What are streams in Node.js?
Streams are Node.js objects that read or write data one chunk at a time, instead of loading everything into memory before doing anything with it.
Theory
TL;DR
- Think of a stream like a conveyor belt at a factory: items move through one at a time, never piling up in one spot
fs.readFileSync('4gb.mp4')loads all 4GB into RAM;fs.createReadStream('4gb.mp4')reads 64KB, processes it, discards it, then reads the next 64KB- Four types: Readable (source), Writable (destination), Duplex (both directions), Transform (modifies data as it passes through)
- Use
pipe()to connect streams; usepipeline()in production because it handles errors and cleanup automatically - Decision rule: file over 50MB or any real-time data source gets streams. Under 1MB with no concurrency, sync methods are fine.
Quick example
// Without streams - entire file in memory
const data = fs.readFileSync('4gb-video.mp4'); // Crashes if file > RAM
res.end(data);
// With streams - 64KB at a time, memory stays ~1-2MB
fs.createReadStream('4gb-video.mp4').pipe(res);
// Reads 64KB, sends it, reads next 64KBThe second version reads one chunk, sends it, then reads the next. The file can be 50GB and memory use stays flat.
How backpressure works
When a readable stream produces data faster than a writable stream can consume it, Node.js pauses the source automatically. That mechanism is called backpressure. Without it, data piles up in memory until the process crashes. pipe() handles all of this internally. If you wire streams manually with .on('data'), you have to implement backpressure yourself, and most code that does this gets it wrong.
I debugged a production crash once that came from exactly this: a readable stream writing to a slow TCP socket, no backpressure handling, buffer growing until OOM. After that I stopped writing manual .on('data') handlers and switched to pipe() or pipeline() everywhere.
Four stream types
| Type | Direction | Common examples |
|---|---|---|
| Readable | Data comes out | fs.createReadStream(), http.IncomingMessage |
| Writable | Data goes in | fs.createWriteStream(), http.ServerResponse |
| Duplex | Both directions | net.Socket |
| Transform | Reads, modifies, outputs | zlib.createGzip(), crypto.createCipheriv() |
Transform streams are the most useful day-to-day. They sit in the middle of a pipeline and change each chunk before passing it along.
When to use streams
- Large files (50MB+): prevents out-of-memory crashes on the server
- HTTP request and response bodies:
reqin Express is already a Readable stream - Real-time data: WebSockets, database cursors, child process stdout
- On-the-fly transformation: compress, encrypt, or parse while reading
- Piping between sources: file to file, network to file, database to HTTP response
For config files and small JSON under 1MB, readFileSync is simpler and fast enough. No need for stream setup there.
How it works internally
Node.js uses libuv for I/O operations. When you create a readable stream, libuv reads data from disk in chunks. The default chunk size is 64KB, controlled by the highWaterMark option. Each chunk triggers a 'data' event on the stream. If the consumer is too slow, the internal buffer fills up and the stream calls pause() on itself automatically. When the consumer catches up and drains the buffer, the stream calls resume(). That is the full backpressure cycle.
Common mistakes
Mistake 1: Assuming pipe() is synchronous
// Wrong - "Done!" logs before data transfer finishes
fs.createReadStream('file.txt')
.pipe(fs.createWriteStream('output.txt'));
console.log('Done!'); // Runs immediately
// Correct - wait for the finish event
fs.createReadStream('file.txt')
.pipe(fs.createWriteStream('output.txt'))
.on('finish', () => console.log('Done!'));pipe() returns immediately. The actual data transfer happens asynchronously.
Mistake 2: No error handling on streams
// Wrong - errors disappear without notice
fs.createReadStream('file.txt')
.pipe(fs.createWriteStream('output.txt'));
// Correct - pipeline destroys all streams on any error
const { pipeline } = require('stream');
pipeline(
fs.createReadStream('file.txt'),
fs.createWriteStream('output.txt'),
(err) => {
if (err) console.error('Pipeline failed:', err);
else console.log('Done');
}
);If the source file disappears mid-transfer or the disk fills up, an error event fires. Without a listener, you get an unhandled error. pipeline() also destroys every stream in the chain automatically.
Mistake 3: Setting highWaterMark too high
// Wrong - buffers 10MB before backpressure kicks in
const readable = fs.createReadStream('file.txt', {
highWaterMark: 10 * 1024 * 1024
});
// Correct - 64KB default works for most cases
const readable = fs.createReadStream('file.txt');
// Exception - increase slightly for slow network streams
const readable = fs.createReadStream('file.txt', {
highWaterMark: 256 * 1024 // 256KB
});highWaterMark is the buffer threshold before backpressure triggers. A 10MB value means Node.js accumulates 10MB before pausing. That removes most of the benefit of streaming.
Mistake 4: Piping the same readable stream twice
// Wrong - second pipe gets nothing
const readable = fs.createReadStream('file.txt');
readable.pipe(fs.createWriteStream('copy1.txt'));
readable.pipe(fs.createWriteStream('copy2.txt')); // Empty
// Correct - create two separate read streams
fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy1.txt'));
fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy2.txt'));A readable stream is consumed once. After the first pipe(), the data is gone.
Mistake 5: Passing objects without objectMode
// Wrong - objects become "[object Object]"
const transform = new Transform({
transform(chunk, encoding, callback) {
callback(null, { processed: true });
}
});
// Correct
const transform = new Transform({
objectMode: true,
transform(chunk, encoding, callback) {
callback(null, { processed: true }); // Passes through as-is
}
});By default, streams work with Buffers and strings. To pass JavaScript objects through, set objectMode: true.
Real-world usage
- Express:
resis a Writable stream; pipe files directly withfs.createReadStream().pipe(res) zlib:createGzip()andcreateGunzip()compress and decompress on the fly- Database drivers: Mongoose
.cursor(), MongoDB.find().stream()for large result sets csv-parser: reads a CSV file line by line, emitting one parsed object per rowchild_process:child.stdoutis a Readable stream;child.stdinis Writable- HTTP/2 and WebSocket connections use streams internally
Follow-up questions
Q: What is the difference between pipe() and pipeline()?
A: pipe() connects streams but does not clean up if one of them fails. Other streams in the chain keep running and leak memory. pipeline() (Node.js 10+) destroys all streams in the chain automatically when any one fails. In production, always prefer pipeline().
Q: How do you know if backpressure is actually occurring?
A: writable.write(chunk) returns false when the internal buffer is full. When it returns true, the destination is ready for more. That return value is exactly what pipe() checks on every single write internally.
Q: Why does fs.readFileSync() still exist if streams handle large files better?
A: For files under 1MB where you need the content immediately and are not handling concurrent requests, sync is simpler. No event listeners, no callbacks, just a value. Streams have setup overhead that is not worth it at small scales.
Q: (Senior) How do you implement backpressure in a custom Transform stream that calls an async API for each chunk?
A: Call callback() only after the async call resolves. The stream's internal queue fills naturally if the API is slow, triggering backpressure upstream automatically. The pattern is async transform(chunk, encoding, callback) { const result = await apiCall(chunk); callback(null, result); }. Never call callback before the async work finishes, or you will overwhelm the API and skip backpressure entirely.
Examples
Streaming a large file as an HTTP response
const http = require('http');
const fs = require('fs');
const server = http.createServer((req, res) => {
// 2GB file, memory stays ~1-2MB throughout
fs.createReadStream('./large-video.mp4')
.on('error', (err) => {
res.writeHead(500);
res.end('File not found');
})
.pipe(res);
});
server.listen(3000);Without the stream, fs.readFile() would load the entire file into memory before sending a single byte. With the stream, the first chunk reaches the client within milliseconds.
Processing a large CSV with a Transform stream
const fs = require('fs');
const { Transform } = require('stream');
const csv = require('csv-parser');
// Process 500MB CSV without loading it into memory
fs.createReadStream('users-500mb.csv')
.pipe(csv())
.pipe(new Transform({
objectMode: true,
transform(row, encoding, callback) {
// Modify each row as it passes through
row.email = row.email.toUpperCase();
callback(null, JSON.stringify(row) + '\n');
}
}))
.pipe(fs.createWriteStream('users-processed.txt'))
.on('finish', () => console.log('Done'));
// Memory stays ~5-10MB throughout, regardless of file sizeThe Transform stream receives one parsed row at a time, modifies it, and passes it to the next stage. The file is never fully in memory at any point.
Using pipeline with error handling
const { pipeline } = require('stream');
const fs = require('fs');
const zlib = require('zlib');
// Compress a log file - all errors handled, all streams cleaned up
pipeline(
fs.createReadStream('access.log'),
zlib.createGzip(),
fs.createWriteStream('access.log.gz'),
(err) => {
if (err) {
console.error('Compression failed:', err);
} else {
console.log('Compressed successfully');
}
}
);If any stream in the chain fails (disk full, file deleted, network drop), pipeline() destroys all remaining streams and calls the callback with the error. With pipe() alone, you would need to attach a separate error handler to each stream manually.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.