Suggest an editImprove this articleRefine the answer for “What are streams in Node.js?”. Your changes go to moderation before they’re published.Approval requiredContentWhat you’re changing🇺🇸EN🇺🇦UAPreviewTitle (EN)Short answer (EN)**Streams** in Node.js are objects that read or write data one chunk at a time, keeping memory use constant regardless of file size. ```js // 4GB file, ~1-2MB memory fs.createReadStream('video.mp4').pipe(res); ``` **Key point:** `pipe()` connects streams and handles backpressure automatically. For production, prefer `pipeline()` (Node 10+) which also destroys all streams in the chain if any one fails.Shown above the full answer for quick recall.Answer (EN)Image**Streams** are Node.js objects that read or write data one chunk at a time, instead of loading everything into memory before doing anything with it. ## Theory ### TL;DR - Think of a stream like a conveyor belt at a factory: items move through one at a time, never piling up in one spot - `fs.readFileSync('4gb.mp4')` loads all 4GB into RAM; `fs.createReadStream('4gb.mp4')` reads 64KB, processes it, discards it, then reads the next 64KB - Four types: Readable (source), Writable (destination), Duplex (both directions), Transform (modifies data as it passes through) - Use `pipe()` to connect streams; use `pipeline()` in production because it handles errors and cleanup automatically - Decision rule: file over 50MB or any real-time data source gets streams. Under 1MB with no concurrency, sync methods are fine. ### Quick example ```js // Without streams - entire file in memory const data = fs.readFileSync('4gb-video.mp4'); // Crashes if file > RAM res.end(data); // With streams - 64KB at a time, memory stays ~1-2MB fs.createReadStream('4gb-video.mp4').pipe(res); // Reads 64KB, sends it, reads next 64KB ``` The second version reads one chunk, sends it, then reads the next. The file can be 50GB and memory use stays flat. ### How backpressure works When a readable stream produces data faster than a writable stream can consume it, Node.js pauses the source automatically. That mechanism is called backpressure. Without it, data piles up in memory until the process crashes. `pipe()` handles all of this internally. If you wire streams manually with `.on('data')`, you have to implement backpressure yourself, and most code that does this gets it wrong. I debugged a production crash once that came from exactly this: a readable stream writing to a slow TCP socket, no backpressure handling, buffer growing until OOM. After that I stopped writing manual `.on('data')` handlers and switched to `pipe()` or `pipeline()` everywhere. ### Four stream types | Type | Direction | Common examples | |------|-----------|------------------| | Readable | Data comes out | `fs.createReadStream()`, `http.IncomingMessage` | | Writable | Data goes in | `fs.createWriteStream()`, `http.ServerResponse` | | Duplex | Both directions | `net.Socket` | | Transform | Reads, modifies, outputs | `zlib.createGzip()`, `crypto.createCipheriv()` | Transform streams are the most useful day-to-day. They sit in the middle of a pipeline and change each chunk before passing it along. ### When to use streams - Large files (50MB+): prevents out-of-memory crashes on the server - HTTP request and response bodies: `req` in Express is already a Readable stream - Real-time data: WebSockets, database cursors, child process stdout - On-the-fly transformation: compress, encrypt, or parse while reading - Piping between sources: file to file, network to file, database to HTTP response For config files and small JSON under 1MB, `readFileSync` is simpler and fast enough. No need for stream setup there. ### How it works internally Node.js uses libuv for I/O operations. When you create a readable stream, libuv reads data from disk in chunks. The default chunk size is 64KB, controlled by the `highWaterMark` option. Each chunk triggers a `'data'` event on the stream. If the consumer is too slow, the internal buffer fills up and the stream calls `pause()` on itself automatically. When the consumer catches up and drains the buffer, the stream calls `resume()`. That is the full backpressure cycle. ### Common mistakes **Mistake 1: Assuming `pipe()` is synchronous** ```js // Wrong - "Done!" logs before data transfer finishes fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')); console.log('Done!'); // Runs immediately // Correct - wait for the finish event fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')) .on('finish', () => console.log('Done!')); ``` `pipe()` returns immediately. The actual data transfer happens asynchronously. **Mistake 2: No error handling on streams** ```js // Wrong - errors disappear without notice fs.createReadStream('file.txt') .pipe(fs.createWriteStream('output.txt')); // Correct - pipeline destroys all streams on any error const { pipeline } = require('stream'); pipeline( fs.createReadStream('file.txt'), fs.createWriteStream('output.txt'), (err) => { if (err) console.error('Pipeline failed:', err); else console.log('Done'); } ); ``` If the source file disappears mid-transfer or the disk fills up, an error event fires. Without a listener, you get an unhandled error. `pipeline()` also destroys every stream in the chain automatically. **Mistake 3: Setting `highWaterMark` too high** ```js // Wrong - buffers 10MB before backpressure kicks in const readable = fs.createReadStream('file.txt', { highWaterMark: 10 * 1024 * 1024 }); // Correct - 64KB default works for most cases const readable = fs.createReadStream('file.txt'); // Exception - increase slightly for slow network streams const readable = fs.createReadStream('file.txt', { highWaterMark: 256 * 1024 // 256KB }); ``` `highWaterMark` is the buffer threshold before backpressure triggers. A 10MB value means Node.js accumulates 10MB before pausing. That removes most of the benefit of streaming. **Mistake 4: Piping the same readable stream twice** ```js // Wrong - second pipe gets nothing const readable = fs.createReadStream('file.txt'); readable.pipe(fs.createWriteStream('copy1.txt')); readable.pipe(fs.createWriteStream('copy2.txt')); // Empty // Correct - create two separate read streams fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy1.txt')); fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy2.txt')); ``` A readable stream is consumed once. After the first `pipe()`, the data is gone. **Mistake 5: Passing objects without `objectMode`** ```js // Wrong - objects become "[object Object]" const transform = new Transform({ transform(chunk, encoding, callback) { callback(null, { processed: true }); } }); // Correct const transform = new Transform({ objectMode: true, transform(chunk, encoding, callback) { callback(null, { processed: true }); // Passes through as-is } }); ``` By default, streams work with Buffers and strings. To pass JavaScript objects through, set `objectMode: true`. ### Real-world usage - Express: `res` is a Writable stream; pipe files directly with `fs.createReadStream().pipe(res)` - `zlib`: `createGzip()` and `createGunzip()` compress and decompress on the fly - Database drivers: Mongoose `.cursor()`, MongoDB `.find().stream()` for large result sets - `csv-parser`: reads a CSV file line by line, emitting one parsed object per row - `child_process`: `child.stdout` is a Readable stream; `child.stdin` is Writable - HTTP/2 and WebSocket connections use streams internally ### Follow-up questions **Q:** What is the difference between `pipe()` and `pipeline()`? **A:** `pipe()` connects streams but does not clean up if one of them fails. Other streams in the chain keep running and leak memory. `pipeline()` (Node.js 10+) destroys all streams in the chain automatically when any one fails. In production, always prefer `pipeline()`. **Q:** How do you know if backpressure is actually occurring? **A:** `writable.write(chunk)` returns `false` when the internal buffer is full. When it returns `true`, the destination is ready for more. That return value is exactly what `pipe()` checks on every single write internally. **Q:** Why does `fs.readFileSync()` still exist if streams handle large files better? **A:** For files under 1MB where you need the content immediately and are not handling concurrent requests, sync is simpler. No event listeners, no callbacks, just a value. Streams have setup overhead that is not worth it at small scales. **Q:** (Senior) How do you implement backpressure in a custom Transform stream that calls an async API for each chunk? **A:** Call `callback()` only after the async call resolves. The stream's internal queue fills naturally if the API is slow, triggering backpressure upstream automatically. The pattern is `async transform(chunk, encoding, callback) { const result = await apiCall(chunk); callback(null, result); }`. Never call `callback` before the async work finishes, or you will overwhelm the API and skip backpressure entirely. ## Examples ### Streaming a large file as an HTTP response ```js const http = require('http'); const fs = require('fs'); const server = http.createServer((req, res) => { // 2GB file, memory stays ~1-2MB throughout fs.createReadStream('./large-video.mp4') .on('error', (err) => { res.writeHead(500); res.end('File not found'); }) .pipe(res); }); server.listen(3000); ``` Without the stream, `fs.readFile()` would load the entire file into memory before sending a single byte. With the stream, the first chunk reaches the client within milliseconds. ### Processing a large CSV with a Transform stream ```js const fs = require('fs'); const { Transform } = require('stream'); const csv = require('csv-parser'); // Process 500MB CSV without loading it into memory fs.createReadStream('users-500mb.csv') .pipe(csv()) .pipe(new Transform({ objectMode: true, transform(row, encoding, callback) { // Modify each row as it passes through row.email = row.email.toUpperCase(); callback(null, JSON.stringify(row) + '\n'); } })) .pipe(fs.createWriteStream('users-processed.txt')) .on('finish', () => console.log('Done')); // Memory stays ~5-10MB throughout, regardless of file size ``` The Transform stream receives one parsed row at a time, modifies it, and passes it to the next stage. The file is never fully in memory at any point. ### Using pipeline with error handling ```js const { pipeline } = require('stream'); const fs = require('fs'); const zlib = require('zlib'); // Compress a log file - all errors handled, all streams cleaned up pipeline( fs.createReadStream('access.log'), zlib.createGzip(), fs.createWriteStream('access.log.gz'), (err) => { if (err) { console.error('Compression failed:', err); } else { console.log('Compressed successfully'); } } ); ``` If any stream in the chain fails (disk full, file deleted, network drop), `pipeline()` destroys all remaining streams and calls the callback with the error. With `pipe()` alone, you would need to attach a separate error handler to each stream manually.For the reviewerNote to the moderator (optional)Visible only to the moderator. Helps review go faster.