Skip to main content
Practice Problems

Parsing pipeline - from bytes to DOM and CSSOM

Parsing Pipeline is the process of transforming HTML/CSS bytes into data structures (DOM and CSSOM) that the browser can work with. Understanding this process is critical for optimizing page load performance.

HTML Parsing: from Bytes to DOM

Stage 1: Byte Stream β†’ Character Stream

Bytes: 3C 68 31 3E ... ↓ (Character Encoding) Characters: <h1>Hello</h1>

Character Encoding Detection:

  1. BOM (Byte Order Mark)
  2. HTTP Content-Type header: charset=utf-8
  3. Meta tag: <meta charset="utf-8">
  4. Fallback: auto-detection

Stage 2: Tokenization (Lexical Analysis)

HTML parser converts characters into tokens:

html
<div class="container"> <h1>Title</h1> <p>Text</p> </div>

Tokens:

StartTag: div (attributes: class="container") StartTag: h1 Character: Title EndTag: h1 StartTag: p Character: Text EndTag: p EndTag: div

Stage 3: Tree Construction

Tokens are transformed into DOM nodes and DOM tree is built:

Document

html

head

body

div.container

h1

p

Title

Text

Preload Scanner β€” Critical Optimization

Preload Scanner works in parallel with HTML parser and preloads resources:

html
<html> <head> <!-- Parser here --> <link rel="stylesheet" href="style.css"> <script src="app.js"></script> </head> <body> <img src="hero.jpg"> <!-- Preload Scanner already found this! -->

What Preload Scanner finds:

  • <link rel="stylesheet">
  • <script src>
  • <img src>
  • <link rel="preload">

CSS Parsing: CSSOM Construction

Stage 1: CSS Tokenization

css
body { color: blue; font-size: 16px; }

Tokens:

Selector: body Property: color Value: blue Property: font-size Value: 16px

Stage 2: CSSOM Construction

StyleSheetList

body color:blue font-size:16px

.container width:1200px

h1 font-weight:bold

CSS Blocking

CSS blocks rendering!

html
<head> <link rel="stylesheet" href="style.css"> <!-- Blocks! --> </head> <body> <!-- Content won't render until CSS loads -->

Solution β€” Media Queries:

html
<link rel="stylesheet" href="print.css" media="print"> <!-- Doesn't block screen --> <link rel="stylesheet" href="mobile.css" media="(max-width: 600px)">

Render Tree Construction

DOM + CSSOM = Render Tree

html
<div style="display:none">Hidden</div> <div class="visible">Visible</div>

Render Tree contains only visible elements:

  • display: none β€” not in Render Tree
  • visibility: hidden β€” in Render Tree (takes space)
  • <head>, <script>, <meta> β€” not in Render Tree

DOM Tree

Render Tree

CSSOM

Layout Tree

Paint

Speculative Parsing

Modern browsers use speculative parsing:

html
<script src="slow.js"></script> <!-- Blocks parsing --> <img src="image1.jpg"> <img src="image2.jpg">

Without Speculative Parsing:

  1. Parsing stops at <script>
  2. Wait for loading and execution
  3. Continue parsing

With Speculative Parsing:

  1. Parsing stops at <script>
  2. But speculative thread continues parsing
  3. Finds image1.jpg, image2.jpg and starts loading!

Performance Best Practices

Minimize CSS

CSS blocks rendering. Use critical CSS inline for above-the-fold content.

Use async/defer for scripts

Don't block parsing. <script defer> doesn't block.

Help Preload Scanner

Use <link rel="preload"> for critical resources.

Avoid document.write()

Completely breaks Speculative Parsing! Summary:

Parsing Pipeline is a complex multi-stage process with many optimizations (Preload Scanner, Speculative Parsing). Understanding this process helps write HTML/CSS that loads faster.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?
Practice Problems