Suggest an edit

Improve this article

Refine the answer for “What are streams in Node.js?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**Streams** are Node.js objects that read or write data one chunk at a time, instead of loading everything into memory before doing anything with it.

## Theory

### TL;DR

- Think of a stream like a conveyor belt at a factory: items move through one at a time, never piling up in one spot
- `fs.readFileSync('4gb.mp4')` loads all 4GB into RAM; `fs.createReadStream('4gb.mp4')` reads 64KB, processes it, discards it, then reads the next 64KB
- Four types: Readable (source), Writable (destination), Duplex (both directions), Transform (modifies data as it passes through)
- Use `pipe()` to connect streams; use `pipeline()` in production because it handles errors and cleanup automatically
- Decision rule: file over 50MB or any real-time data source gets streams. Under 1MB with no concurrency, sync methods are fine.

### Quick example

```js
// Without streams - entire file in memory
const data = fs.readFileSync('4gb-video.mp4'); // Crashes if file > RAM
res.end(data);

// With streams - 64KB at a time, memory stays ~1-2MB
fs.createReadStream('4gb-video.mp4').pipe(res);
// Reads 64KB, sends it, reads next 64KB
```

The second version reads one chunk, sends it, then reads the next. The file can be 50GB and memory use stays flat.

### How backpressure works

When a readable stream produces data faster than a writable stream can consume it, Node.js pauses the source automatically. That mechanism is called backpressure. Without it, data piles up in memory until the process crashes. `pipe()` handles all of this internally. If you wire streams manually with `.on('data')`, you have to implement backpressure yourself, and most code that does this gets it wrong.

I debugged a production crash once that came from exactly this: a readable stream writing to a slow TCP socket, no backpressure handling, buffer growing until OOM. After that I stopped writing manual `.on('data')` handlers and switched to `pipe()` or `pipeline()` everywhere.

### Four stream types

| Type | Direction | Common examples |
|------|-----------|------------------|
| Readable | Data comes out | `fs.createReadStream()`, `http.IncomingMessage` |
| Writable | Data goes in | `fs.createWriteStream()`, `http.ServerResponse` |
| Duplex | Both directions | `net.Socket` |
| Transform | Reads, modifies, outputs | `zlib.createGzip()`, `crypto.createCipheriv()` |

Transform streams are the most useful day-to-day. They sit in the middle of a pipeline and change each chunk before passing it along.

### When to use streams

- Large files (50MB+): prevents out-of-memory crashes on the server
- HTTP request and response bodies: `req` in Express is already a Readable stream
- Real-time data: WebSockets, database cursors, child process stdout
- On-the-fly transformation: compress, encrypt, or parse while reading
- Piping between sources: file to file, network to file, database to HTTP response

For config files and small JSON under 1MB, `readFileSync` is simpler and fast enough. No need for stream setup there.

### How it works internally

Node.js uses libuv for I/O operations. When you create a readable stream, libuv reads data from disk in chunks. The default chunk size is 64KB, controlled by the `highWaterMark` option. Each chunk triggers a `'data'` event on the stream. If the consumer is too slow, the internal buffer fills up and the stream calls `pause()` on itself automatically. When the consumer catches up and drains the buffer, the stream calls `resume()`. That is the full backpressure cycle.

### Common mistakes

**Mistake 1: Assuming `pipe()` is synchronous**

```js
// Wrong - "Done!" logs before data transfer finishes
fs.createReadStream('file.txt')
  .pipe(fs.createWriteStream('output.txt'));
console.log('Done!'); // Runs immediately

// Correct - wait for the finish event
fs.createReadStream('file.txt')
  .pipe(fs.createWriteStream('output.txt'))
  .on('finish', () => console.log('Done!'));
```

`pipe()` returns immediately. The actual data transfer happens asynchronously.

**Mistake 2: No error handling on streams**

```js
// Wrong - errors disappear without notice
fs.createReadStream('file.txt')
  .pipe(fs.createWriteStream('output.txt'));

// Correct - pipeline destroys all streams on any error
const { pipeline } = require('stream');
pipeline(
  fs.createReadStream('file.txt'),
  fs.createWriteStream('output.txt'),
  (err) => {
    if (err) console.error('Pipeline failed:', err);
    else console.log('Done');
  }
);
```

If the source file disappears mid-transfer or the disk fills up, an error event fires. Without a listener, you get an unhandled error. `pipeline()` also destroys every stream in the chain automatically.

**Mistake 3: Setting `highWaterMark` too high**

```js
// Wrong - buffers 10MB before backpressure kicks in
const readable = fs.createReadStream('file.txt', {
  highWaterMark: 10 * 1024 * 1024
});

// Correct - 64KB default works for most cases
const readable = fs.createReadStream('file.txt');

// Exception - increase slightly for slow network streams
const readable = fs.createReadStream('file.txt', {
  highWaterMark: 256 * 1024 // 256KB
});
```

`highWaterMark` is the buffer threshold before backpressure triggers. A 10MB value means Node.js accumulates 10MB before pausing. That removes most of the benefit of streaming.

**Mistake 4: Piping the same readable stream twice**

```js
// Wrong - second pipe gets nothing
const readable = fs.createReadStream('file.txt');
readable.pipe(fs.createWriteStream('copy1.txt'));
readable.pipe(fs.createWriteStream('copy2.txt')); // Empty

// Correct - create two separate read streams
fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy1.txt'));
fs.createReadStream('file.txt').pipe(fs.createWriteStream('copy2.txt'));
```

A readable stream is consumed once. After the first `pipe()`, the data is gone.

**Mistake 5: Passing objects without `objectMode`**

```js
// Wrong - objects become "[object Object]"
const transform = new Transform({
  transform(chunk, encoding, callback) {
    callback(null, { processed: true });
  }
});

// Correct
const transform = new Transform({
  objectMode: true,
  transform(chunk, encoding, callback) {
    callback(null, { processed: true }); // Passes through as-is
  }
});
```

By default, streams work with Buffers and strings. To pass JavaScript objects through, set `objectMode: true`.

### Real-world usage

- Express: `res` is a Writable stream; pipe files directly with `fs.createReadStream().pipe(res)`
- `zlib`: `createGzip()` and `createGunzip()` compress and decompress on the fly
- Database drivers: Mongoose `.cursor()`, MongoDB `.find().stream()` for large result sets
- `csv-parser`: reads a CSV file line by line, emitting one parsed object per row
- `child_process`: `child.stdout` is a Readable stream; `child.stdin` is Writable
- HTTP/2 and WebSocket connections use streams internally

### Follow-up questions

**Q:** What is the difference between `pipe()` and `pipeline()`?
**A:** `pipe()` connects streams but does not clean up if one of them fails. Other streams in the chain keep running and leak memory. `pipeline()` (Node.js 10+) destroys all streams in the chain automatically when any one fails. In production, always prefer `pipeline()`.

**Q:** How do you know if backpressure is actually occurring?
**A:** `writable.write(chunk)` returns `false` when the internal buffer is full. When it returns `true`, the destination is ready for more. That return value is exactly what `pipe()` checks on every single write internally.

**Q:** Why does `fs.readFileSync()` still exist if streams handle large files better?
**A:** For files under 1MB where you need the content immediately and are not handling concurrent requests, sync is simpler. No event listeners, no callbacks, just a value. Streams have setup overhead that is not worth it at small scales.

**Q:** (Senior) How do you implement backpressure in a custom Transform stream that calls an async API for each chunk?
**A:** Call `callback()` only after the async call resolves. The stream's internal queue fills naturally if the API is slow, triggering backpressure upstream automatically. The pattern is `async transform(chunk, encoding, callback) { const result = await apiCall(chunk); callback(null, result); }`. Never call `callback` before the async work finishes, or you will overwhelm the API and skip backpressure entirely.

## Examples

### Streaming a large file as an HTTP response

```js
const http = require('http');
const fs = require('fs');

const server = http.createServer((req, res) => {
  // 2GB file, memory stays ~1-2MB throughout
  fs.createReadStream('./large-video.mp4')
    .on('error', (err) => {
      res.writeHead(500);
      res.end('File not found');
    })
    .pipe(res);
});

server.listen(3000);
```

Without the stream, `fs.readFile()` would load the entire file into memory before sending a single byte. With the stream, the first chunk reaches the client within milliseconds.

### Processing a large CSV with a Transform stream

```js
const fs = require('fs');
const { Transform } = require('stream');
const csv = require('csv-parser');

// Process 500MB CSV without loading it into memory
fs.createReadStream('users-500mb.csv')
  .pipe(csv())
  .pipe(new Transform({
    objectMode: true,
    transform(row, encoding, callback) {
      // Modify each row as it passes through
      row.email = row.email.toUpperCase();
      callback(null, JSON.stringify(row) + '\n');
    }
  }))
  .pipe(fs.createWriteStream('users-processed.txt'))
  .on('finish', () => console.log('Done'));

// Memory stays ~5-10MB throughout, regardless of file size
```

The Transform stream receives one parsed row at a time, modifies it, and passes it to the next stage. The file is never fully in memory at any point.

### Using pipeline with error handling

```js
const { pipeline } = require('stream');
const fs = require('fs');
const zlib = require('zlib');

// Compress a log file - all errors handled, all streams cleaned up
pipeline(
  fs.createReadStream('access.log'),
  zlib.createGzip(),
  fs.createWriteStream('access.log.gz'),
  (err) => {
    if (err) {
      console.error('Compression failed:', err);
    } else {
      console.log('Compressed successfully');
    }
  }
);
```

If any stream in the chain fails (disk full, file deleted, network drop), `pipeline()` destroys all remaining streams and calls the callback with the error. With `pipe()` alone, you would need to attach a separate error handler to each stream manually.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1453 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.