Skip to content

refactor(format/html): overhaul HTML formatter#8806

Merged
dyc3 merged 61 commits intonextfrom
dyc3/html-big-refactor
Jan 21, 2026
Merged

refactor(format/html): overhaul HTML formatter#8806
dyc3 merged 61 commits intonextfrom
dyc3/html-big-refactor

Conversation

@dyc3
Copy link
Copy Markdown
Contributor

@dyc3 dyc3 commented Jan 19, 2026

Summary

This completely overhauls the HTML formatter to address whitespace sensitivity and other various hard-to-debug mismatches.

There are 2 major aspects:

  • It no longer uses best_fitting like the JsxChildList formatter does, which is more closely aligned to Prettier's IR.
  • There is now a concept of "internal" and "external" whitespace sensitivity. Some elements are sensitive to siblings, but not to their children.

AI was used pretty heavily to analyze prettier's IR output, and diagnose where biome's output differs.

There are some regressions, but mostly for snippets that use legacy/deprecated elements like <marquee>.

TODO:

  • changeset
  • rebase this to next
  • fix html full support

Test Plan

Prettier snapshots show meaningful improvements to matching prettier's output.
Some of our snapshots have changed too, but only to match what prettier would output.

Docs

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jan 19, 2026

🦋 Changeset detected

Latest commit: a8a7cdf

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
@biomejs/biome Patch
@biomejs/cli-win32-x64 Patch
@biomejs/cli-win32-arm64 Patch
@biomejs/cli-darwin-x64 Patch
@biomejs/cli-darwin-arm64 Patch
@biomejs/cli-linux-x64 Patch
@biomejs/cli-linux-arm64 Patch
@biomejs/cli-linux-x64-musl Patch
@biomejs/cli-linux-arm64-musl Patch
@biomejs/wasm-web Patch
@biomejs/wasm-bundler Patch
@biomejs/wasm-nodejs Patch
@biomejs/backend-jsonrpc Patch
@biomejs/prettier-compare Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor Author

dyc3 commented Jan 19, 2026

@dyc3 dyc3 mentioned this pull request Jan 19, 2026
1 task
@github-actions github-actions Bot added A-Formatter Area: formatter L-HTML Language: HTML and super languages labels Jan 19, 2026
@dyc3 dyc3 changed the title WIP refactor(format/html): HTML formatter overhaul Jan 19, 2026
@dyc3 dyc3 requested review from a team January 19, 2026 13:33
@dyc3 dyc3 marked this pull request as ready for review January 19, 2026 13:33
@dyc3 dyc3 changed the title refactor(format/html): HTML formatter overhaul refactor(format/html): overhaul HTML formatter Jan 19, 2026
Comment thread crates/biome_html_formatter/src/html/auxiliary/element.rs
Base automatically changed from dyc3/prettier-compare-tool to main January 20, 2026 11:14
@dyc3 dyc3 added this to the Biome v2.4 milestone Jan 20, 2026
@dyc3 dyc3 force-pushed the dyc3/html-big-refactor branch from a4a37ea to 5f36696 Compare January 20, 2026 11:49
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 20, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing dyc3/html-big-refactor (a8a7cdf) with next (62dc6d7)

Summary

✅ 58 untouched benchmarks
⏩ 95 skipped benchmarks1

Footnotes

  1. 95 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 20, 2026

Walkthrough

This PR overhauls the HTML formatter: adds a CSS-display module and whitespace-sensitivity APIs; introduces token-borrowing for adjacent inline elements (sharing/suppressing the closing >), threads borrowed state through opening/closing element formatters and element lists; rewrites element-list layout to BestFitting/Multiline with ChildrenMeta and borrowing-aware child handling; refactors comment placement and verbatim leading-comment formatting (suppression-aware); and updates Svelte auxiliary formatters to use the new element-list API and inline child formatting.

Possibly related PRs

  • biomejs/biome PR 8804 — Adds HTML syntax helpers and token accessors (tag_name, closing-angle tokens, AnyHtmlElement) used by the borrowing and closing-angle logic.
  • biomejs/biome PR 8595 — Modifies html children splitting and trailing-whitespace handling that overlap the new html_split_children and whitespace-preservation changes.
  • biomejs/biome PR 7467 — Introduces embedded-content formatting and propagate_expand plumbing that interacts with element-list and embedded formatting paths.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(format/html): overhaul HTML formatter' directly and clearly summarises the main change: a comprehensive refactoring of the HTML formatter.
Description check ✅ Passed The description explains the motivation (whitespace sensitivity issues), the two major aspects of the overhaul, AI assistance used, and the test plan with Prettier snapshots. It's fully related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_html_formatter/src/comments.rs (1)

83-112: Guard preceding_node before unwrap()—line 97 will panic. When a comment appears immediately before an unmatched closing tag, preceding_node() can be None, causing a panic. The inner guard at line 87 only checks whether the node is an HtmlOpeningElement, not whether preceding_node itself exists.

Suggested fix
         if let Some(_closing_tag) = comment
             .following_node()
             .and_then(|node| node.clone().cast::<HtmlClosingElement>())
         {
-            if let Some(_preceding_opening_tag) = comment
-                .preceding_node()
-                .and_then(|node| node.clone().cast::<HtmlOpeningElement>())
-            {
-                return CommentPlacement::dangling(
-                    comment.preceding_node().unwrap().clone(),
-                    comment,
-                );
-            } else {
-                return CommentPlacement::trailing(
-                    comment.preceding_node().unwrap().clone(),
-                    comment,
-                );
-            }
+            if let Some(preceding) = comment.preceding_node() {
+                if preceding.clone().cast::<HtmlOpeningElement>().is_some() {
+                    return CommentPlacement::dangling(preceding, comment);
+                }
+                return CommentPlacement::trailing(preceding, comment);
+            }
         }
🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/utils/children.rs`:
- Around line 401-430: Fix the typo in the explanatory comment inside
children.rs: change "attacked" to "attached" in the sentence explaining that
trivia adjacent to the closing tag is attached to the closing tag's leading
trivia; this is purely a comment change near the block that checks
closing_element and l_angle_token (the code that inspects leading_trivia and
then uses builder.entry(HtmlChild::EmptyLine/Newline/Whitespace)).

In `@crates/biome_html_formatter/src/utils/metadata.rs`:
- Around line 701-760: The SVG filter tag lookup fails because SVG_ALL_TAGS
contains mixed/camelCase entries (e.g., feBlend) while the lookup in
css_display.rs lowercases the tag before binary_search; fix by making
SVG_ALL_TAGS entries all-lowercase (e.g., feBlend -> feblend), ensure the array
remains sorted for binary_search, and update any related comments/tests if they
assume original casing; alternatively, change the lookup code to compare against
the original non-lowercased tag name instead of lowercasing — reference
SVG_ALL_TAGS and the binary_search-based SVG/MathML lookup in css_display.rs.
🧹 Nitpick comments (7)
crates/biome_html_formatter/src/utils/css_display.rs (1)

231-239: Binary search assumes sorted tag lists. If SVG_ALL_TAGS or MATHML_ALL_TAGS drift out of order, membership checks silently fail. Consider contains to remove the ordering dependency.

♻️ Possible simplification
-            if SVG_ALL_TAGS.binary_search(&other).is_ok()
-                || MATHML_ALL_TAGS.binary_search(&other).is_ok()
+            if SVG_ALL_TAGS.contains(&other) || MATHML_ALL_TAGS.contains(&other)
             {
                 CssDisplay::Block
             } else {
                 CssDisplay::Inline
             }
crates/biome_html_formatter/src/utils/children.rs (1)

62-66: Consider removing dead code or adding a TODO.

The text() method is marked with #[expect(dead_code)]. If it's intended for future use, a brief comment explaining the plan would be helpful. Otherwise, consider removing it until needed.

crates/biome_html_formatter/src/verbatim.rs (2)

241-241: Unused peekable() call.

The iterator is converted to Peekable but peek() is never called. This can be simplified.

♻️ Simplify iterator
-    let leading_comments_iter = leading_comments.iter().peekable();
-    for comment in leading_comments_iter {
+    for comment in leading_comments {

276-289: TODO for line comment logic review.

The TODO at line 277 indicates this logic needs review. The current implementation handles spacing based on lines_before and lines_after, but the interaction may need further verification against Prettier's behaviour.

Would you like me to help investigate how Prettier handles line comments in HTML to verify this logic?

crates/biome_html_formatter/src/html/auxiliary/element.rs (2)

123-130: Add a comment explaining the nth(2) magic number.

A past review requested clarification. The inline comment explains what nth(2) does, but a brief note on why index 2 specifically would help future readers.

📝 Enhance comment
         let is_root_element_list = node
             .syntax()
             .ancestors()
-            // get the third element (index 2)
-            // because the first one is Self (`HtmlElement`), 2nd one is the `HtmlElementList`
-            // third one is either `HtmlRoot` or another `HtmlElement`
+            // Get the grandparent (index 2 in ancestors):
+            // - Index 0: Self (HtmlElement)
+            // - Index 1: Parent HtmlElementList
+            // - Index 2: Grandparent (HtmlRoot if at root level, otherwise another HtmlElement)
             .nth(2)
             .is_some_and(|ancestor| HtmlRoot::can_cast(ancestor.kind()));

303-313: Consider clarifying operator precedence in condition.

The condition spans multiple lines with && and || operators. While correct, the precedence could be clearer with explicit parentheses or restructuring.

♻️ Clarify precedence
-    if !node.children().is_empty() && tag_name.eq_ignore_ascii_case("body")
-        || node.children().iter().any(|child| {
+    let is_non_empty_body =
+        !node.children().is_empty() && tag_name.eq_ignore_ascii_case("body");
+    let grandchildren_have_elements = node.children().iter().any(|child| {
             if let Some(element) = child.as_html_element() {
                 has_non_text_child(&element.children())
             } else {
                 false
             }
-        })
-    {
+        });
+
+    if is_non_empty_body || grandchildren_have_elements {
         return true;
     }
crates/biome_html_formatter/src/html/lists/element_list.rs (1)

391-394: Commented-out code should be removed or documented.

The // || children_meta.has_block_element appears to be leftover from development. If it's intentionally disabled, please add a TODO comment explaining the rationale. Otherwise, remove it to keep the codebase clean.

Suggested fix
             let mut force_multiline = matches!(self.layout, HtmlChildListLayout::Multiline)
-                // || children_meta.has_block_element
                 || children_meta.multiple_block_elements;

Comment thread crates/biome_html_formatter/src/utils/children.rs
Comment thread crates/biome_html_formatter/src/utils/metadata.rs
@github-actions github-actions Bot added the A-Project Area: project label Jan 20, 2026
Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's a lot of code that I can hardly follow, so take my review with a pinch of salt. Looking at the snapshots though, I think we made some great improvements!

Changeset is missing, maybe a patch with some like this

**BREAKING CHANGES the HTML formatter only.**

Refactor the internal logic of the HTML formatter to be closer to Prettier's formatting.... etc.

//! This implementation is very heavily inspired by the JSX formatter implementation for JsxChildList.

use std::cell::RefCell;
//! # Caution
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this comment

@dyc3 dyc3 force-pushed the dyc3/html-big-refactor branch from bf4ba55 to 24321e3 Compare January 21, 2026 16:15
@dyc3 dyc3 changed the base branch from main to next January 21, 2026 16:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs`:
- Around line 1-182: Fix the minor spelling/grammar mistakes in the top-of-file
documentation in element_list.rs: update misspellings such as "acomplishes" to
"accomplishes", "wierd" to "weird", and any other small typos or punctuation
issues in the HtmlElementList/HTML formatter doc comments so the documentation
reads cleanly and consistently.
🧹 Nitpick comments (1)
.changeset/ready-taxes-build.md (1)

5-9: Add a tiny before/after diff example.
For formatter overhauls, a short diff helps users gauge impact.

Proposed addition
-<div><span>foo</span><span>bar</span></div>
+<div><span>foo</span> <span>bar</span></div>

Based on learnings, please include a diff snippet for formatter changes.

Comment thread crates/biome_html_formatter/src/html/lists/element_list.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_html_formatter/src/comments.rs (1)

95-112: Avoid unwrap() here to prevent a formatter crash on malformed HTML.
If a comment precedes a stray closing tag, preceding_node() can be None, and this will panic.

🛠️ Proposed guard
-            if let Some(_closing_tag) = comment
+            if let Some(closing_tag) = comment
                 .following_node()
                 .and_then(|node| node.clone().cast::<HtmlClosingElement>())
             {
-                if let Some(_preceding_opening_tag) = comment
-                    .preceding_node()
-                    .and_then(|node| node.clone().cast::<HtmlOpeningElement>())
-                {
-                    return CommentPlacement::dangling(
-                        comment.preceding_node().unwrap().clone(),
-                        comment,
-                    );
-                } else {
-                    return CommentPlacement::trailing(
-                        comment.preceding_node().unwrap().clone(),
-                        comment,
-                    );
-                }
+                if let Some(preceding) = comment.preceding_node() {
+                    if preceding.clone().cast::<HtmlOpeningElement>().is_some() {
+                        return CommentPlacement::dangling(preceding.clone(), comment);
+                    }
+                    return CommentPlacement::trailing(preceding.clone(), comment);
+                }
+                return CommentPlacement::leading(closing_tag.syntax().clone(), comment);
             }

@github-actions github-actions Bot added the A-CLI Area: CLI label Jan 21, 2026
dyc3 added 5 commits January 21, 2026 12:14
add get_css_whitespace

mostly AI slop but cleaned up a little

progress? i think?

fix breaking when attributes break

fix spacing between comments and text

fix something
@dyc3 dyc3 force-pushed the dyc3/html-big-refactor branch from 574976c to a8a7cdf Compare January 21, 2026 17:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/biome_html_formatter/src/comments.rs (1)

95-113: Potential panic when preceding_node() is None.

Line 109 calls unwrap() on comment.preceding_node() in the else branch, but the outer condition only checks that following_node() is a HtmlClosingElement. If there's no preceding node (e.g., comment is the first child), this will panic.

🐛 Proposed fix
             if let Some(_preceding_opening_tag) = comment
                 .preceding_node()
                 .and_then(|node| node.clone().cast::<HtmlOpeningElement>())
             {
                 return CommentPlacement::dangling(
                     comment.preceding_node().unwrap().clone(),
                     comment,
                 );
-            } else {
+            } else if let Some(preceding) = comment.preceding_node() {
                 return CommentPlacement::trailing(
-                    comment.preceding_node().unwrap().clone(),
+                    preceding.clone(),
                     comment,
                 );
             }
crates/biome_html_formatter/src/html/auxiliary/element.rs (1)

196-207: Borrowing still enabled for force‑break elements.

The comment says force‑break elements should not borrow tokens, but the guards don’t check forces_break_children. If that’s unintended, add the guard so the implementation matches the rule.

💡 Suggested patch
-        let should_borrow_opening_r_angle = is_element_internally_whitespace_sensitive
+        let should_borrow_opening_r_angle = is_element_internally_whitespace_sensitive
+            && !forces_break_children
             && !children.is_empty()
             && !content_has_leading_whitespace
             && !should_be_verbatim
             && !should_format_embedded_nodes;
-        let should_borrow_closing_tag = is_element_internally_whitespace_sensitive
+        let should_borrow_closing_tag = is_element_internally_whitespace_sensitive
+            && !forces_break_children
             && !children.is_empty()
             && !content_has_trailing_whitespace
             && !should_be_verbatim
             && !should_format_embedded_nodes;
🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs`:
- Around line 375-413: The code computes force_multiline for children but
ignores the stored opening_tag_group option; update the multiline decision to
consult opening_tag_group (or remove the unused field). Specifically, in the
block that initializes force_multiline (currently using matches!(self.layout,
HtmlChildListLayout::Multiline) || children_meta.multiple_block_elements), also
OR in a check of self.opening_tag_group indicating a broken/multiline opening
tag (e.g., self.opening_tag_group.is_multiline() or
matches!(self.opening_tag_group, OpeningTagGroup::Multiline)). Ensure you
reference the same boolean (force_multiline) so that if the opening tag broke
onto multiple lines the children are forced multiline; alternatively, if you
decide not to use it, remove the opening_tag_group field and any code that sets
it to avoid dead state.
🧹 Nitpick comments (1)
crates/biome_html_formatter/src/verbatim.rs (1)

276-289: TODO comment needs attention.

The CommentKind::Line branch has a TODO indicating the logic needs review. Since HTML comments are now classified as Block (per get_comment_kind), this branch may be unreachable for standard HTML. However, if this code path is exercised by other syntaxes or edge cases, the lines_after == 0 case doing nothing could be intentional or a gap.

Would you like me to verify whether this code path is reachable, or help clarify the intended behaviour?

Comment on lines +375 to +413
let formatted_children = format_with(|f| {
// Print borrowed opening `>` token
if let Some(ref r_angle_token) = self.borrowed_tokens.borrowed_opening_r_angle {
write!(f, [r_angle_token.format()])?;
}

None => None,
};
// Determine if we should force multiline
// Force multiline if:
// - Layout is explicitly set to Multiline
// - Children contain block elements
// - There are multiple block elements
let mut force_multiline = matches!(self.layout, HtmlChildListLayout::Multiline)
|| children_meta.multiple_block_elements;
let mut children_iter = HtmlChildrenIterator::new(children.iter());

// Trim leading new lines, but track if there were any
let mut had_leading_newline = false;
if !self.is_container_whitespace_sensitive {
while matches!(
children_iter.peek(),
Some(HtmlChild::Newline | HtmlChild::EmptyLine)
) {
had_leading_newline = true;
children_iter.next();
}
}

child_breaks = separator.is_some_and(|separator| separator.will_break());
// Force multiline if there was a leading newline AND there are block elements
// This respects the user's intent to break when they have:
// <div>
// <div>...</div>
// </div>
if had_leading_newline && children_meta.has_block_element {
force_multiline = true;
}

flat.write(&format_args![word, separator], f);
if force_multiline {
write!(f, [expand_parent()])?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the file structure
find . -name "element_list.rs" -type f

Repository: biomejs/biome

Length of output: 117


🏁 Script executed:

# Read the file to understand its context and find the relevant lines
wc -l crates/biome_html_formatter/src/html/lists/element_list.rs

Repository: biomejs/biome

Length of output: 119


🏁 Script executed:

# Read around lines 375-413 to see the actual code
sed -n '350,450p' crates/biome_html_formatter/src/html/lists/element_list.rs | cat -n

Repository: biomejs/biome

Length of output: 5671


🏁 Script executed:

# Search for opening_tag_group usage in this file
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -n

Repository: biomejs/biome

Length of output: 212


🏁 Script executed:

# Search for where opening_tag_group is defined/stored (likely in struct initialization)
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/ -n -B 2 -A 2

Repository: biomejs/biome

Length of output: 1380


🏁 Script executed:

# Look for the struct definition that contains opening_tag_group
rg "struct.*{" crates/biome_html_formatter/src/html/lists/element_list.rs -A 20 | head -60

Repository: biomejs/biome

Length of output: 138


🏁 Script executed:

# Search for all uses of opening_tag_group - specifically where it's READ (not just assigned)
rg "self\.opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -n

Repository: biomejs/biome

Length of output: 120


🏁 Script executed:

# Also check for any usage of opening_tag_group without self
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -n

Repository: biomejs/biome

Length of output: 212


🏁 Script executed:

# Let's look at the whole fmt_children method and surrounding context
sed -n '200,420p' crates/biome_html_formatter/src/html/lists/element_list.rs | cat -n

Repository: biomejs/biome

Length of output: 10333


Wire opening_tag_group into the multiline decision or remove it.

opening_tag_group is stored via options but never consulted when determining if children should be multiline. This means children can stay flat even when the opening tag's attributes break onto multiple lines. Wire it into the decision logic if you want them in sync, otherwise drop the unused field.

🤖 Prompt for AI Agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs` around lines 375
- 413, The code computes force_multiline for children but ignores the stored
opening_tag_group option; update the multiline decision to consult
opening_tag_group (or remove the unused field). Specifically, in the block that
initializes force_multiline (currently using matches!(self.layout,
HtmlChildListLayout::Multiline) || children_meta.multiple_block_elements), also
OR in a check of self.opening_tag_group indicating a broken/multiline opening
tag (e.g., self.opening_tag_group.is_multiline() or
matches!(self.opening_tag_group, OpeningTagGroup::Multiline)). Ensure you
reference the same boolean (force_multiline) so that if the opening tag broke
onto multiple lines the children are forced multiline; alternatively, if you
decide not to use it, remove the opening_tag_group field and any code that sets
it to avoid dead state.

@dyc3 dyc3 merged commit cb112ce into next Jan 21, 2026
32 checks passed
@dyc3 dyc3 deleted the dyc3/html-big-refactor branch January 21, 2026 17:46
@github-actions github-actions Bot mentioned this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Formatter Area: formatter A-Project Area: project L-HTML Language: HTML and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants