Suggest an edit

Improve this article

Refine the answer for “V8 architecture: from code to machine instructions”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**V8 architecture** describes the 4-stage pipeline that turns JavaScript source code into native machine instructions: Parser (AST), Ignition (bytecode), Sparkplug (baseline machine code), TurboFan (optimized machine code), with deoptimization as the safety net when runtime type assumptions fail.

## Theory

### TL;DR

- V8 is like a car's gearbox: starts in low gear (Ignition interpreter) for instant execution, shifts up to TurboFan as the code runs hotter
- Main flow: source -> AST -> bytecode -> baseline machine code -> optimized machine code, with deoptimization when type assumptions fail
- TurboFan activates after roughly 1000+ calls with consistent type feedback
- **Decision rule:** keep function arguments the same type across calls - one type per call site means TurboFan can fully optimize
- Profile deopts with `node --trace-deopt`; look for "type mismatch" and "wrong map" entries

### Quick Example

```javascript
function add(a, b) {
  return a + b;
}

for (let i = 0; i < 10000; i++) {
  add(1, 2); // calls 1-100: Ignition bytecode
             // calls ~100-1000: Sparkplug baseline
             // calls 1000+: TurboFan optimized
}

add("x", "y"); // DEOPT - V8 assumed number+number
```

V8 runs the loop slowly at first, then progressively compiles `add` to faster machine code. The final call with strings breaks the type assumption and triggers deoptimization - back to bytecode.

### The Pipeline

The **Parser** reads JS source text and builds an AST (Abstract Syntax Tree). The tree captures program structure without any runtime information about types or values:

```json
{
  "type": "FunctionDeclaration",
  "id": { "name": "add" },
  "params": [{ "name": "a" }, { "name": "b" }],
  "body": {
    "type": "ReturnStatement",
    "argument": {
      "type": "BinaryExpression",
      "operator": "+",
      "left": { "name": "a" },
      "right": { "name": "b" }
    }
  }
}
```

**Ignition** compiles that AST to compact bytecode and starts executing it right away. The bytecode for `add(a, b)`:

```
Ldar a0        // load argument 0 into accumulator
Add a1, [0]   // add argument 1
Return         // return result
```

At each bytecode instruction, Ignition records type feedback: "this Add saw two Smi integers." That feedback becomes TurboFan's input later.

**Sparkplug** (added in V8 9.1) compiles hot bytecode to machine code with a 1:1 mapping - no profiling, no speculation. About 2x faster than Ignition and takes almost no time to compile. A fast middle layer between interpreter and full optimizer.

**TurboFan** reads Ignition's type feedback, speculates that types stay consistent, and generates highly optimized machine code: type specialization, function inlining, loop unrolling, dead code elimination. The speedup over pure interpretation is roughly 10x. But TurboFan is a bet. If the types change, V8 has to pay the deopt penalty.

### Hidden Classes and Inline Caching

V8 doesn't store property metadata on objects directly. Instead it uses **Hidden Classes** (also called Maps or Shapes internally). Two objects with the same properties in the same order share a Hidden Class, and V8 can cache property offsets for that class.

```javascript
class Point {
  constructor(x, y) {
    this.x = x; // transitions: C0 -> C1
    this.y = y; // transitions: C1 -> C2
  }
}

const p1 = new Point(1, 2); // Hidden Class C2
const p2 = new Point(3, 4); // same Hidden Class C2 - fast path
```

**Inline Caching (IC)** builds on this. When `getX(point)` runs the first time, V8 records: "for Hidden Class C2, property `x` is at offset 0." The next call with the same class skips the lookup entirely.

```javascript
function getX(point) {
  return point.x;
}

getX({ x: 1, y: 2 }); // V8: x is at offset 0 for class C2
getX({ x: 5, y: 9 }); // cache hit - no lookup
```

IC has three states, and they matter a lot for performance:

- **Monomorphic:** one Hidden Class per call site. TurboFan can inline property access fully. This is what you want.
- **Polymorphic:** 2-4 classes. V8 keeps a small dispatch table. Slower, but recoverable.
- **Megamorphic:** 5+ classes. V8 gives up caching and does a generic property lookup every time. Hard to recover from.

In practice, I've seen megamorphic call sites appear in generic utility functions that get called with every kind of object in a codebase. Once a site goes megamorphic, refactoring callers to a consistent shape is the only real fix.

### Deoptimization

TurboFan compiles with assumptions. When those assumptions break at runtime, V8 deoptimizes: it swaps the running optimized frame for an unoptimized one via OSR (On-Stack Replacement) and falls back to bytecode.

The deopt cost is real: stack rewind, discarded compiled code, and another round of feedback collection before TurboFan can try again. Worse, deopt maps persist. If a call site has been deoptimized once for a type mismatch, V8 is cautious about re-optimizing it aggressively.

```javascript
function counter(x) {
  return x + 1;
}

for (let i = 0; i < 1e6; i++) counter(i);  // TurboFan: optimized for int
counter("a");                                // DEOPT
for (let i = 0; i < 1e6; i++) counter(i);  // slower - deopt map lingers
```

Run `node --trace-deopt app.js` to catch these. The output shows the function name, the deopt reason, and the bytecode position.

### Generational Garbage Collection

V8 splits the heap into Young Generation (~1-8 MB) and Old Generation (~100 MB+). The core idea: most objects are short-lived.

**Scavenge GC** runs on Young Generation. It copies live objects to a To-Space and drops everything else. Fast - typically 1-2ms. Objects that survive two Scavenge cycles get promoted to Old Generation.

**Mark-Sweep-Compact** runs on Old Generation. It marks live objects, sweeps dead ones, and compacts the heap. This pause can reach 50-100ms, which is why keeping short-lived objects short-lived matters in latency-sensitive code.

```javascript
function processItem(data) {
  const temp = { result: transform(data) }; // lives and dies here
  return temp.result;
}
// temp is Scavenge-collected

const appCache = new Map(); // survives, promoted to Old Generation
```

### Common Mistakes

**Mixing argument types in hot functions:**

```javascript
function serialize(val) {
  return val + "";
}

for (let i = 0; i < 1e6; i++) serialize(i);  // TurboFan: number path
serialize(true);                               // polymorphic now
serialize({ x: 1 });                          // approaching megamorphic
```

Fix: separate functions per type, or validate input before the hot path.

**try/catch wrapping hot loops:**

```javascript
// Bad - disables inlining inside the loop
try {
  for (let i = 0; i < 1e6; i++) compute(i);
} catch (e) { handleError(e); }

// Good - narrow try to only what can throw
for (let i = 0; i < 1e6; i++) {
  compute(i); // no try here - stays inlineable
}
```

**`delete` on hot objects:**

```javascript
const obj = { x: 1, y: 2 };
delete obj.y; // creates a new Hidden Class - breaks IC

// Instead:
obj.y = undefined; // same Hidden Class, property still there
```

**Dynamic property addition after construction:**

```javascript
// Bad - each assignment transitions to a new class
const config = {};
config.host = "localhost";
config.port = 3000;

// Good - one class from the start
const config = { host: "localhost", port: 3000 };
```

### Real-world Usage

- **React:** reconciliation loops run hot in TurboFan. Polymorphic prop shapes per component type push IC toward polymorphic or megamorphic states. `useMemo` stabilizes object shapes between renders.
- **Node.js / Express:** request handlers warm up quickly - Sparkplug on first requests, TurboFan after a few hundred. Profile with `node --trace-opt --trace-deopt` on staging before prod.
- **TensorFlow.js:** numeric kernels use `Float32Array` and `Int32Array` to stay in typed-array fast paths, bypassing generic object overhead entirely.
- **Chrome DevTools:** the CPU profiler distinguishes TurboFan frames from Ignition frames. TurboFan frames appear as "optimized" in the flame chart.

### Follow-up Questions

**Q:** Walk me through what happens from JS source to execution.
**A:** Parser reads source text and builds an AST. Ignition compiles the AST to bytecode and executes it right away, collecting type feedback at each instruction. If the function runs enough times, Sparkplug compiles bytecode to baseline machine code. With more calls and consistent type feedback, TurboFan produces fully optimized machine code. Any type mismatch at runtime triggers deoptimization back to bytecode.

**Q:** What exactly triggers TurboFan?
**A:** V8 uses call count thresholds - roughly 1000-2000 calls - and bytecode execution time. TurboFan also needs enough type feedback to make speculative optimization worth the compile cost.

**Q:** What is deoptimization and what causes it?
**A:** TurboFan compiles code under assumptions about types. When an assumption fails at runtime - for example, an argument was always an integer, then a string arrives - V8 deoptimizes. It replaces the optimized frame with an unoptimized one and falls back to bytecode. The function may be re-optimized later, but the deopt map persists.

**Q:** Monomorphic vs polymorphic vs megamorphic - what is the real performance difference?
**A:** Monomorphic sites allow TurboFan to fully inline and specialize. Polymorphic sites (2-4 classes) use a small dispatch table - slower but cacheable. Megamorphic sites (5+ classes) fall back to a generic property lookup on every call. In tight loops, the difference between monomorphic and megamorphic can be 10x or more.

**Q (senior):** How does OSR work during deoptimization, and why does it matter for async code?
**A:** On-Stack Replacement lets V8 swap an execution frame from optimized to unoptimized code while the frame is still on the stack - without unwinding the full call stack. For generators and async functions, partial frames can exist mid-suspension. Without OSR, deoptimizing a suspended generator would mean tearing down and restarting the entire execution context, which is prohibitively expensive for long-running async tasks.

## Examples

### Basic: The optimization phases in a loop

```javascript
function square(n) {
  return n * n;
}

// Phase 1: Ignition bytecode (first ~100 calls)
square(2);
square(3);

// Phase 2: Sparkplug baseline (~100-1000 calls)
for (let i = 0; i < 500; i++) square(i);

// Phase 3: TurboFan optimized (1000+ calls, all numbers)
for (let i = 0; i < 10000; i++) square(i);

console.log(square(9)); // 81 - runs in native machine code
```

All calls return the same result. What changes is the execution speed. At call 1 the function is interpreted. At call 10000 it runs in native code with integer arithmetic. No code change required - V8 does this automatically based on how the function is called.

### Intermediate: Keeping Hidden Class stable in a server handler

```javascript
// Bad: shape depends on runtime condition
function formatUser(user, isAdmin) {
  const result = { id: user.id, name: user.name };
  if (isAdmin) result.role = "admin"; // sometimes adds a property
  return result;
}
// Two Hidden Classes: one with 'role', one without
// Code iterating these results gets polymorphic IC

// Good: always same shape
function formatUser(user, isAdmin) {
  return {
    id: user.id,
    name: user.name,
    role: isAdmin ? "admin" : null // always present
  };
}
// One Hidden Class always
// IC stays monomorphic under load
```

In a server handling 10k requests per second, the bad version creates two Hidden Classes depending on the `isAdmin` branch. Any code reading these objects switches to polymorphic IC. The good version produces one class every time, IC stays monomorphic, and TurboFan can fully optimize callers.

### Advanced: Deopt map persistence trap

```javascript
function process(val) {
  return val * 2;
}

// TurboFan optimizes for Smi (small integer)
for (let i = 0; i < 1e6; i++) process(i);

// One call with a float triggers deopt
process(1.5); // 3 - correct result, but triggers type mismatch

// The deopt map is now set for this call site
// Even returning to integers, re-optimization is slower
for (let i = 0; i < 1e6; i++) process(i); // still correct, deopt lingers

// Diagnosis:
// node --trace-deopt script.js
// Output: [deoptimize: process, reason: not a Smi]

// Fix: separate call sites per type
function processInt(val)   { return val * 2; } // stays int-optimized
function processFloat(val) { return val * 2; } // separate feedback
```

The return values are always correct. What changes is which machine code path executes them. This pattern shows up in numeric processing code where a single edge case - a float from user input, a null from an API - deoptimizes a function that runs millions of times per second. The fix is not about correctness; it is about keeping call sites type-pure.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts2043 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.