Skip to main content

What are streams in Node.js?

Streams are Node.js objects that read or write data one chunk at a time, instead of loading everything into memory before doing anything with it.

Theory

TL;DR

  • Think of a stream like a conveyor belt at a factory: items move through one at a time, never piling up in one spot
  • fs.readFileSync('4gb.mp4') loads all 4GB into RAM; fs.createReadStream('4gb.mp4') reads 64KB, processes it, discards it, then reads the next 64KB
  • Four types: Readable (source), Writable (destination), Duplex (both directions), Transform (modifies data as it passes through)
  • Use pipe() to connect streams; use pipeline() in production because it handles errors and cleanup automatically
  • Decision rule: file over 50MB or any real-time data source gets streams. Under 1MB with no concurrency, sync methods are fine.

Quick example

js
// Without streams - entire file in memory const data = fs.readFileSync('4gb-video.mp4'); // Crashes if file > RAM res.end(data); // With streams - 64KB at a time, memory stays ~1-2MB fs.createReadStream('4gb-video.mp4').pipe(res); // Reads 64KB, sends it, reads next 64KB

The second version reads one chunk, sends it, then reads the next. The file can be 50GB and memory use stays flat.

How backpressure works

When a readable stream produces data faster than a writable stream can consume it, Node.js pauses the source automatically. That mechanism is called backpressure. Without it, data piles up in memory until the process crashes. pipe() handles all of this internally. If you wire streams manually with .on('data'), you have to implement backpressure yourself, and most code that does this gets it wrong.

I debugged a production crash once that came from exactly this: a readable stream writing to a slow TCP socket, no backpressure handling, buffer growing until OOM. After that I stopped writing manual .on('data') handlers and switched to pipe() or pipeline() everywhere.

Four stream types

TypeDirectionCommon examples
ReadableData comes outfs.createReadStream(), http.IncomingMessage
WritableData goes infs.createWriteStream(), http.ServerResponse
DuplexBoth directionsnet.Socket
TransformReads, modifies, outputszlib.createGzip(), crypto.createCipheriv()

Transform streams are the most useful day-to-day. They sit in the middle of a pipeline and change each chunk before passing it along.

When to use streams

  • Large files (50MB+): prevents out-of-memory crashes on the server
  • HTTP request and response bodies: req in Express is already a Readable stream
  • Real-time data: WebSockets, database cursors, child process stdout
  • On-the-fly transformation: compress, encrypt, or parse while reading
  • Piping between sources: file to file, network to file, database to HTTP response

For config files and small JSON under 1MB, readFileSync is simpler and fast enough. No need for stream setup there.

How it works internally

Node.js uses libuv for I/O operations. When you create a readable stream, libuv reads data from disk in chunks. The default chunk size is 64KB, controlled by the highWaterMark option. Each chunk triggers a 'data' event on the stream. If the consumer is too slow, the internal buffer fills up and the stream calls pause() on itself automatically. When the consumer catches up and drains the buffer, the stream calls resume(). That is the full backpressure cycle.

Common mistakes

Mistake 1: Assuming pipe() is synchronous

js
// Wrong - "Done!" logs before data transfer finishes fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')); console.log('Done!'); // Runs immediately // Correct - wait for the finish event fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')) .on('finish', () => console.log('Done!'));

pipe() returns immediately. The actual data transfer happens asynchronously.

Mistake 2: No error handling on streams

js
// Wrong - errors disappear without notice fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')); // Correct - pipeline destroys all streams on any error const { pipeline } = require('stream'); pipeline( fs.createReadStream('file.txt'), fs.createWriteStream('output.txt'), (err) => { if (err) console.error('Pipeline failed:', err); else console.log('Done'); } );

If the source file disappears mid-transfer or the disk fills up, an error event fires. Without a listener, you get an unhandled error. pipeline() also destroys every stream in the chain automatically.

Mistake 3: Setting highWaterMark too high

js
// Wrong - buffers 10MB before backpressure kicks in const readable = fs.createReadStream('file.txt', { highWaterMark: 10 * 1024 * 1024 }); // Correct - 64KB default works for most cases const readable = fs.createReadStream('file.txt'); // Exception - increase slightly for slow network streams const readable = fs.createReadStream('file.txt', { highWaterMark: 256 * 1024 // 256KB });

highWaterMark is the buffer threshold before backpressure triggers. A 10MB value means Node.js accumulates 10MB before pausing. That removes most of the benefit of streaming.

Mistake 4: Piping the same readable stream twice

js
// Wrong - second pipe gets nothing const readable = fs.createReadStream('file.txt'); readable.pipe(fs.createWriteStream('copy1.txt')); readable.pipe(fs.createWriteStream('copy2.txt')); // Empty // Correct - create two separate read streams fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy1.txt')); fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy2.txt'));

A readable stream is consumed once. After the first pipe(), the data is gone.

Mistake 5: Passing objects without objectMode

js
// Wrong - objects become "[object Object]" const transform = new Transform({ transform(chunk, encoding, callback) { callback(null, { processed: true }); } }); // Correct const transform = new Transform({ objectMode: true, transform(chunk, encoding, callback) { callback(null, { processed: true }); // Passes through as-is } });

By default, streams work with Buffers and strings. To pass JavaScript objects through, set objectMode: true.

Real-world usage

  • Express: res is a Writable stream; pipe files directly with fs.createReadStream().pipe(res)
  • zlib: createGzip() and createGunzip() compress and decompress on the fly
  • Database drivers: Mongoose .cursor(), MongoDB .find().stream() for large result sets
  • csv-parser: reads a CSV file line by line, emitting one parsed object per row
  • child_process: child.stdout is a Readable stream; child.stdin is Writable
  • HTTP/2 and WebSocket connections use streams internally

Follow-up questions

Q: What is the difference between pipe() and pipeline()?
A: pipe() connects streams but does not clean up if one of them fails. Other streams in the chain keep running and leak memory. pipeline() (Node.js 10+) destroys all streams in the chain automatically when any one fails. In production, always prefer pipeline().

Q: How do you know if backpressure is actually occurring?
A: writable.write(chunk) returns false when the internal buffer is full. When it returns true, the destination is ready for more. That return value is exactly what pipe() checks on every single write internally.

Q: Why does fs.readFileSync() still exist if streams handle large files better?
A: For files under 1MB where you need the content immediately and are not handling concurrent requests, sync is simpler. No event listeners, no callbacks, just a value. Streams have setup overhead that is not worth it at small scales.

Q: (Senior) How do you implement backpressure in a custom Transform stream that calls an async API for each chunk?
A: Call callback() only after the async call resolves. The stream's internal queue fills naturally if the API is slow, triggering backpressure upstream automatically. The pattern is async transform(chunk, encoding, callback) { const result = await apiCall(chunk); callback(null, result); }. Never call callback before the async work finishes, or you will overwhelm the API and skip backpressure entirely.

Examples

Streaming a large file as an HTTP response

js
const http = require('http'); const fs = require('fs'); const server = http.createServer((req, res) => { // 2GB file, memory stays ~1-2MB throughout fs.createReadStream('./large-video.mp4') .on('error', (err) => { res.writeHead(500); res.end('File not found'); }) .pipe(res); }); server.listen(3000);

Without the stream, fs.readFile() would load the entire file into memory before sending a single byte. With the stream, the first chunk reaches the client within milliseconds.

Processing a large CSV with a Transform stream

js
const fs = require('fs'); const { Transform } = require('stream'); const csv = require('csv-parser'); // Process 500MB CSV without loading it into memory fs.createReadStream('users-500mb.csv') .pipe(csv()) .pipe(new Transform({ objectMode: true, transform(row, encoding, callback) { // Modify each row as it passes through row.email = row.email.toUpperCase(); callback(null, JSON.stringify(row) + '\n'); } })) .pipe(fs.createWriteStream('users-processed.txt')) .on('finish', () => console.log('Done')); // Memory stays ~5-10MB throughout, regardless of file size

The Transform stream receives one parsed row at a time, modifies it, and passes it to the next stage. The file is never fully in memory at any point.

Using pipeline with error handling

js
const { pipeline } = require('stream'); const fs = require('fs'); const zlib = require('zlib'); // Compress a log file - all errors handled, all streams cleaned up pipeline( fs.createReadStream('access.log'), zlib.createGzip(), fs.createWriteStream('access.log.gz'), (err) => { if (err) { console.error('Compression failed:', err); } else { console.log('Compressed successfully'); } } );

If any stream in the chain fails (disk full, file deleted, network drop), pipeline() destroys all remaining streams and calls the callback with the error. With pipe() alone, you would need to attach a separate error handler to each stream manually.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?