refactor(format/html): overhaul HTML formatter#8806
Conversation
🦋 Changeset detectedLatest commit: a8a7cdf The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
a4a37ea to
5f36696
Compare
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
WalkthroughThis PR overhauls the HTML formatter: adds a CSS-display module and whitespace-sensitivity APIs; introduces token-borrowing for adjacent inline elements (sharing/suppressing the closing Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_html_formatter/src/comments.rs (1)
83-112: Guardpreceding_nodebeforeunwrap()—line 97 will panic. When a comment appears immediately before an unmatched closing tag,preceding_node()can beNone, causing a panic. The inner guard at line 87 only checks whether the node is anHtmlOpeningElement, not whetherpreceding_nodeitself exists.Suggested fix
if let Some(_closing_tag) = comment .following_node() .and_then(|node| node.clone().cast::<HtmlClosingElement>()) { - if let Some(_preceding_opening_tag) = comment - .preceding_node() - .and_then(|node| node.clone().cast::<HtmlOpeningElement>()) - { - return CommentPlacement::dangling( - comment.preceding_node().unwrap().clone(), - comment, - ); - } else { - return CommentPlacement::trailing( - comment.preceding_node().unwrap().clone(), - comment, - ); - } + if let Some(preceding) = comment.preceding_node() { + if preceding.clone().cast::<HtmlOpeningElement>().is_some() { + return CommentPlacement::dangling(preceding, comment); + } + return CommentPlacement::trailing(preceding, comment); + } }
🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/utils/children.rs`:
- Around line 401-430: Fix the typo in the explanatory comment inside
children.rs: change "attacked" to "attached" in the sentence explaining that
trivia adjacent to the closing tag is attached to the closing tag's leading
trivia; this is purely a comment change near the block that checks
closing_element and l_angle_token (the code that inspects leading_trivia and
then uses builder.entry(HtmlChild::EmptyLine/Newline/Whitespace)).
In `@crates/biome_html_formatter/src/utils/metadata.rs`:
- Around line 701-760: The SVG filter tag lookup fails because SVG_ALL_TAGS
contains mixed/camelCase entries (e.g., feBlend) while the lookup in
css_display.rs lowercases the tag before binary_search; fix by making
SVG_ALL_TAGS entries all-lowercase (e.g., feBlend -> feblend), ensure the array
remains sorted for binary_search, and update any related comments/tests if they
assume original casing; alternatively, change the lookup code to compare against
the original non-lowercased tag name instead of lowercasing — reference
SVG_ALL_TAGS and the binary_search-based SVG/MathML lookup in css_display.rs.
🧹 Nitpick comments (7)
crates/biome_html_formatter/src/utils/css_display.rs (1)
231-239: Binary search assumes sorted tag lists. IfSVG_ALL_TAGSorMATHML_ALL_TAGSdrift out of order, membership checks silently fail. Considercontainsto remove the ordering dependency.♻️ Possible simplification
- if SVG_ALL_TAGS.binary_search(&other).is_ok() - || MATHML_ALL_TAGS.binary_search(&other).is_ok() + if SVG_ALL_TAGS.contains(&other) || MATHML_ALL_TAGS.contains(&other) { CssDisplay::Block } else { CssDisplay::Inline }crates/biome_html_formatter/src/utils/children.rs (1)
62-66: Consider removing dead code or adding a TODO.The
text()method is marked with#[expect(dead_code)]. If it's intended for future use, a brief comment explaining the plan would be helpful. Otherwise, consider removing it until needed.crates/biome_html_formatter/src/verbatim.rs (2)
241-241: Unusedpeekable()call.The iterator is converted to
Peekablebutpeek()is never called. This can be simplified.♻️ Simplify iterator
- let leading_comments_iter = leading_comments.iter().peekable(); - for comment in leading_comments_iter { + for comment in leading_comments {
276-289: TODO for line comment logic review.The TODO at line 277 indicates this logic needs review. The current implementation handles spacing based on
lines_beforeandlines_after, but the interaction may need further verification against Prettier's behaviour.Would you like me to help investigate how Prettier handles line comments in HTML to verify this logic?
crates/biome_html_formatter/src/html/auxiliary/element.rs (2)
123-130: Add a comment explaining thenth(2)magic number.A past review requested clarification. The inline comment explains what nth(2) does, but a brief note on why index 2 specifically would help future readers.
📝 Enhance comment
let is_root_element_list = node .syntax() .ancestors() - // get the third element (index 2) - // because the first one is Self (`HtmlElement`), 2nd one is the `HtmlElementList` - // third one is either `HtmlRoot` or another `HtmlElement` + // Get the grandparent (index 2 in ancestors): + // - Index 0: Self (HtmlElement) + // - Index 1: Parent HtmlElementList + // - Index 2: Grandparent (HtmlRoot if at root level, otherwise another HtmlElement) .nth(2) .is_some_and(|ancestor| HtmlRoot::can_cast(ancestor.kind()));
303-313: Consider clarifying operator precedence in condition.The condition spans multiple lines with
&&and||operators. While correct, the precedence could be clearer with explicit parentheses or restructuring.♻️ Clarify precedence
- if !node.children().is_empty() && tag_name.eq_ignore_ascii_case("body") - || node.children().iter().any(|child| { + let is_non_empty_body = + !node.children().is_empty() && tag_name.eq_ignore_ascii_case("body"); + let grandchildren_have_elements = node.children().iter().any(|child| { if let Some(element) = child.as_html_element() { has_non_text_child(&element.children()) } else { false } - }) - { + }); + + if is_non_empty_body || grandchildren_have_elements { return true; }crates/biome_html_formatter/src/html/lists/element_list.rs (1)
391-394: Commented-out code should be removed or documented.The
// || children_meta.has_block_elementappears to be leftover from development. If it's intentionally disabled, please add a TODO comment explaining the rationale. Otherwise, remove it to keep the codebase clean.Suggested fix
let mut force_multiline = matches!(self.layout, HtmlChildListLayout::Multiline) - // || children_meta.has_block_element || children_meta.multiple_block_elements;
ematipico
left a comment
There was a problem hiding this comment.
Unfortunately there's a lot of code that I can hardly follow, so take my review with a pinch of salt. Looking at the snapshots though, I think we made some great improvements!
Changeset is missing, maybe a patch with some like this
**BREAKING CHANGES the HTML formatter only.**
Refactor the internal logic of the HTML formatter to be closer to Prettier's formatting.... etc.| //! This implementation is very heavily inspired by the JSX formatter implementation for JsxChildList. | ||
|
|
||
| use std::cell::RefCell; | ||
| //! # Caution |
bf4ba55 to
24321e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs`:
- Around line 1-182: Fix the minor spelling/grammar mistakes in the top-of-file
documentation in element_list.rs: update misspellings such as "acomplishes" to
"accomplishes", "wierd" to "weird", and any other small typos or punctuation
issues in the HtmlElementList/HTML formatter doc comments so the documentation
reads cleanly and consistently.
🧹 Nitpick comments (1)
.changeset/ready-taxes-build.md (1)
5-9: Add a tiny before/after diff example.
For formatter overhauls, a short diff helps users gauge impact.Proposed addition
-<div><span>foo</span><span>bar</span></div> +<div><span>foo</span> <span>bar</span></div>Based on learnings, please include a diff snippet for formatter changes.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_html_formatter/src/comments.rs (1)
95-112: Avoidunwrap()here to prevent a formatter crash on malformed HTML.
If a comment precedes a stray closing tag,preceding_node()can beNone, and this will panic.🛠️ Proposed guard
- if let Some(_closing_tag) = comment + if let Some(closing_tag) = comment .following_node() .and_then(|node| node.clone().cast::<HtmlClosingElement>()) { - if let Some(_preceding_opening_tag) = comment - .preceding_node() - .and_then(|node| node.clone().cast::<HtmlOpeningElement>()) - { - return CommentPlacement::dangling( - comment.preceding_node().unwrap().clone(), - comment, - ); - } else { - return CommentPlacement::trailing( - comment.preceding_node().unwrap().clone(), - comment, - ); - } + if let Some(preceding) = comment.preceding_node() { + if preceding.clone().cast::<HtmlOpeningElement>().is_some() { + return CommentPlacement::dangling(preceding.clone(), comment); + } + return CommentPlacement::trailing(preceding.clone(), comment); + } + return CommentPlacement::leading(closing_tag.syntax().clone(), comment); }
… container is not internally sensitive
specifically for html/bracket-same-line/void-elements.html
574976c to
a8a7cdf
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/biome_html_formatter/src/comments.rs (1)
95-113: Potential panic whenpreceding_node()isNone.Line 109 calls
unwrap()oncomment.preceding_node()in the else branch, but the outer condition only checks thatfollowing_node()is aHtmlClosingElement. If there's no preceding node (e.g., comment is the first child), this will panic.🐛 Proposed fix
if let Some(_preceding_opening_tag) = comment .preceding_node() .and_then(|node| node.clone().cast::<HtmlOpeningElement>()) { return CommentPlacement::dangling( comment.preceding_node().unwrap().clone(), comment, ); - } else { + } else if let Some(preceding) = comment.preceding_node() { return CommentPlacement::trailing( - comment.preceding_node().unwrap().clone(), + preceding.clone(), comment, ); }crates/biome_html_formatter/src/html/auxiliary/element.rs (1)
196-207: Borrowing still enabled for force‑break elements.The comment says force‑break elements should not borrow tokens, but the guards don’t check
forces_break_children. If that’s unintended, add the guard so the implementation matches the rule.💡 Suggested patch
- let should_borrow_opening_r_angle = is_element_internally_whitespace_sensitive + let should_borrow_opening_r_angle = is_element_internally_whitespace_sensitive + && !forces_break_children && !children.is_empty() && !content_has_leading_whitespace && !should_be_verbatim && !should_format_embedded_nodes; - let should_borrow_closing_tag = is_element_internally_whitespace_sensitive + let should_borrow_closing_tag = is_element_internally_whitespace_sensitive + && !forces_break_children && !children.is_empty() && !content_has_trailing_whitespace && !should_be_verbatim && !should_format_embedded_nodes;
🤖 Fix all issues with AI agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs`:
- Around line 375-413: The code computes force_multiline for children but
ignores the stored opening_tag_group option; update the multiline decision to
consult opening_tag_group (or remove the unused field). Specifically, in the
block that initializes force_multiline (currently using matches!(self.layout,
HtmlChildListLayout::Multiline) || children_meta.multiple_block_elements), also
OR in a check of self.opening_tag_group indicating a broken/multiline opening
tag (e.g., self.opening_tag_group.is_multiline() or
matches!(self.opening_tag_group, OpeningTagGroup::Multiline)). Ensure you
reference the same boolean (force_multiline) so that if the opening tag broke
onto multiple lines the children are forced multiline; alternatively, if you
decide not to use it, remove the opening_tag_group field and any code that sets
it to avoid dead state.
🧹 Nitpick comments (1)
crates/biome_html_formatter/src/verbatim.rs (1)
276-289: TODO comment needs attention.The
CommentKind::Linebranch has a TODO indicating the logic needs review. Since HTML comments are now classified asBlock(perget_comment_kind), this branch may be unreachable for standard HTML. However, if this code path is exercised by other syntaxes or edge cases, thelines_after == 0case doing nothing could be intentional or a gap.Would you like me to verify whether this code path is reachable, or help clarify the intended behaviour?
| let formatted_children = format_with(|f| { | ||
| // Print borrowed opening `>` token | ||
| if let Some(ref r_angle_token) = self.borrowed_tokens.borrowed_opening_r_angle { | ||
| write!(f, [r_angle_token.format()])?; | ||
| } | ||
|
|
||
| None => None, | ||
| }; | ||
| // Determine if we should force multiline | ||
| // Force multiline if: | ||
| // - Layout is explicitly set to Multiline | ||
| // - Children contain block elements | ||
| // - There are multiple block elements | ||
| let mut force_multiline = matches!(self.layout, HtmlChildListLayout::Multiline) | ||
| || children_meta.multiple_block_elements; | ||
| let mut children_iter = HtmlChildrenIterator::new(children.iter()); | ||
|
|
||
| // Trim leading new lines, but track if there were any | ||
| let mut had_leading_newline = false; | ||
| if !self.is_container_whitespace_sensitive { | ||
| while matches!( | ||
| children_iter.peek(), | ||
| Some(HtmlChild::Newline | HtmlChild::EmptyLine) | ||
| ) { | ||
| had_leading_newline = true; | ||
| children_iter.next(); | ||
| } | ||
| } | ||
|
|
||
| child_breaks = separator.is_some_and(|separator| separator.will_break()); | ||
| // Force multiline if there was a leading newline AND there are block elements | ||
| // This respects the user's intent to break when they have: | ||
| // <div> | ||
| // <div>...</div> | ||
| // </div> | ||
| if had_leading_newline && children_meta.has_block_element { | ||
| force_multiline = true; | ||
| } | ||
|
|
||
| flat.write(&format_args![word, separator], f); | ||
| if force_multiline { | ||
| write!(f, [expand_parent()])?; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's explore the file structure
find . -name "element_list.rs" -type fRepository: biomejs/biome
Length of output: 117
🏁 Script executed:
# Read the file to understand its context and find the relevant lines
wc -l crates/biome_html_formatter/src/html/lists/element_list.rsRepository: biomejs/biome
Length of output: 119
🏁 Script executed:
# Read around lines 375-413 to see the actual code
sed -n '350,450p' crates/biome_html_formatter/src/html/lists/element_list.rs | cat -nRepository: biomejs/biome
Length of output: 5671
🏁 Script executed:
# Search for opening_tag_group usage in this file
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -nRepository: biomejs/biome
Length of output: 212
🏁 Script executed:
# Search for where opening_tag_group is defined/stored (likely in struct initialization)
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/ -n -B 2 -A 2Repository: biomejs/biome
Length of output: 1380
🏁 Script executed:
# Look for the struct definition that contains opening_tag_group
rg "struct.*{" crates/biome_html_formatter/src/html/lists/element_list.rs -A 20 | head -60Repository: biomejs/biome
Length of output: 138
🏁 Script executed:
# Search for all uses of opening_tag_group - specifically where it's READ (not just assigned)
rg "self\.opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -nRepository: biomejs/biome
Length of output: 120
🏁 Script executed:
# Also check for any usage of opening_tag_group without self
rg "opening_tag_group" crates/biome_html_formatter/src/html/lists/element_list.rs -nRepository: biomejs/biome
Length of output: 212
🏁 Script executed:
# Let's look at the whole fmt_children method and surrounding context
sed -n '200,420p' crates/biome_html_formatter/src/html/lists/element_list.rs | cat -nRepository: biomejs/biome
Length of output: 10333
Wire opening_tag_group into the multiline decision or remove it.
opening_tag_group is stored via options but never consulted when determining if children should be multiline. This means children can stay flat even when the opening tag's attributes break onto multiple lines. Wire it into the decision logic if you want them in sync, otherwise drop the unused field.
🤖 Prompt for AI Agents
In `@crates/biome_html_formatter/src/html/lists/element_list.rs` around lines 375
- 413, The code computes force_multiline for children but ignores the stored
opening_tag_group option; update the multiline decision to consult
opening_tag_group (or remove the unused field). Specifically, in the block that
initializes force_multiline (currently using matches!(self.layout,
HtmlChildListLayout::Multiline) || children_meta.multiple_block_elements), also
OR in a check of self.opening_tag_group indicating a broken/multiline opening
tag (e.g., self.opening_tag_group.is_multiline() or
matches!(self.opening_tag_group, OpeningTagGroup::Multiline)). Ensure you
reference the same boolean (force_multiline) so that if the opening tag broke
onto multiple lines the children are forced multiline; alternatively, if you
decide not to use it, remove the opening_tag_group field and any code that sets
it to avoid dead state.
Summary
This completely overhauls the HTML formatter to address whitespace sensitivity and other various hard-to-debug mismatches.
There are 2 major aspects:
best_fittinglike theJsxChildListformatter does, which is more closely aligned to Prettier's IR.AI was used pretty heavily to analyze prettier's IR output, and diagnose where biome's output differs.
There are some regressions, but mostly for snippets that use legacy/deprecated elements like
<marquee>.TODO:
nextTest Plan
Prettier snapshots show meaningful improvements to matching prettier's output.
Some of our snapshots have changed too, but only to match what prettier would output.
Docs