Skip to content

#881: allow <br> tags#940

Merged
garronej merged 5 commits intokeycloakify:mainfrom
rgrosjean:bugfix/allow_br_tags
Oct 21, 2025
Merged

#881: allow <br> tags#940
garronej merged 5 commits intokeycloakify:mainfrom
rgrosjean:bugfix/allow_br_tags

Conversation

@rgrosjean
Copy link
Copy Markdown
Contributor

@rgrosjean rgrosjean commented Oct 21, 2025

Summary by CodeRabbit

  • Bug Fixes

    • HTML sanitization now treats tag names case-insensitively and restricts automatic removal to a defined set of simple formatting tags, preventing unintended stripping and better preserving surrounding content.
  • Tests

    • Added tests to verify preservation of break tags, horizontal rules, and adjacent text, and to ensure non-whitelisted tags without attributes are removed while keeping their inner text.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 21, 2025

Walkthrough

afterSanitizeAttributes now lowercases tag names and only removes empty nodes when the lowercased tag is one of ["a","font","img","input","span"] and not present in tagsAllowedWithNoAttribute. Tests were added to verify
and


handling and removal of non-whitelisted tags without attributes.

Changes

Cohort / File(s) Summary
Sanitization Logic
src/lib/kcSanitize/HtmlPolicyBuilder.ts
allowWithoutAttributes stores lowercased tag names via toLowerCase(). afterSanitizeAttributes lowercases currentNode.tagName and only removes nodes with no attributes and no children when the lowercased tag is in the fixed set ["a","font","img","input","span"] and not present in tagsAllowedWithNoAttribute. Added an OWASP Java HTML Sanitizer reference comment.
Test Coverage
test/kcSanitize/KcSanitizer.spec.ts
Added tests: preserve <br> with adjacent text (<br>Line 1<br>Line 2), preserve text around <hr> (Line before horizontal rule<hr>Line after horizontal rule), and remove non-whitelisted tag without attributes while preserving inner text.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Sanitizer
    participant Policy as HtmlPolicyBuilder.afterSanitizeAttributes
    participant Allowed as tagsAllowedWithNoAttribute
    participant Fixed as FixedTagSet

    Note over Fixed: FixedTagSet = ["a","font","img","input","span"]
    Caller->>Policy: afterSanitizeAttributes(node, tagName, attrs)
    Policy->>Policy: lowerTag = tagName.toLowerCase()
    Policy->>Fixed: is lowerTag in FixedTagSet?
    alt lowerTag in FixedTagSet
        Policy->>Allowed: is lowerTag in tagsAllowedWithNoAttribute?
        alt NOT in Allowed
            Policy-->>Caller: remove node (no attrs & no children)
        else in Allowed
            Policy-->>Caller: keep node
        end
    else not in FixedTagSet
        Policy-->>Caller: keep node
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I lower each tag with quiet delight,
I trim the empty ones that match my sight,
<br> hops in place, <hr> stays on cue,
Tests nibble through to prove what is true,
A tidy change — a rabbit's small bite.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "#881: allow
tags" is directly aligned with the primary objective of the changeset. The PR objectives confirm that the main goal is to "modify kcSanitize behavior to allow
tags to pass the sanitizer," which is exactly what the title conveys. The title is concise, specific, and clear—it uses a concrete technical term (
tags) rather than vague language, making it easy for teammates scanning the repository history to understand the primary change at a glance.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a4179b and 91d1900.

📒 Files selected for processing (1)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test with Node v18 on ubuntu-latest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rgrosjean
Copy link
Copy Markdown
Contributor Author

hi @garronej I based this change on the OWASP Java HTML Sanitizer documentation:
https://github.com/OWASP/java-html-sanitizer?tab=readme-ov-file#custom-policies

Please note that the elements "a", "font", "img", "input" and "span" need to be explicitly whitelisted using the allowWithoutAttributes() method if you want them to be allowed through the filter when these elements do not include any attributes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)

198-206: Document the rationale for the hardcoded tag list.

The change correctly allows <br> tags by restricting removal checks to a specific set of tags. However, the hardcoded list ["A", "FONT", "IMG", "INPUT", "SPAN"] lacks explanation for why these particular tags require special handling when empty.

Consider adding a comment explaining why these five tags specifically need to be removed when they have no attributes and no children, while other tags (like <br>) should be preserved. This will help future maintainers understand the business logic.

             if (
                 currentNode.attributes.length == 0 &&
                 currentNode.childNodes.length == 0
             ) {
+                // Only remove specific tags when empty and not explicitly allowed.
+                // A, IMG: meaningless without href/src attributes
+                // FONT, SPAN: styling tags with no effect when empty
+                // INPUT: form element that requires attributes to be functional
                 if (["A", "FONT", "IMG", "INPUT", "SPAN"].includes(currentNode.tagName)) {
                     if (
                         !this.tagsAllowedWithNoAttribute.has(
                             currentNode.tagName.toLowerCase()
                         )
                     ) {
                         currentNode.remove();
                     }
                 }
             }
test/kcSanitize/KcSanitizer.spec.ts (1)

121-122: Good test coverage for the basic case.

The test correctly verifies that <br> tags are preserved through sanitization along with adjacent text.

For more comprehensive coverage, consider adding tests for:

  • <br> tag alone: assertResult("<br>", "<br>");
  • Self-closing syntax: assertResult("<br>", "<br/>");
  • Multiple breaks: assertResult("Line1<br><br>Line2", "Line1<br><br>Line2");

However, the current test is sufficient to validate the PR objective.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4af1c3e and b081412.

📒 Files selected for processing (2)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts (1 hunks)
  • test/kcSanitize/KcSanitizer.spec.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test with Node v18 on ubuntu-latest

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (2)

200-200: Consider extracting the hardcoded list as a constant.

The array ["a", "font", "img", "input", "span"] represents tags requiring explicit whitelisting per OWASP. Extracting this as a named constant (e.g., TAGS_REQUIRING_EXPLICIT_WHITELIST) would improve maintainability and make the intent clearer.

Apply this diff:

+    // Tags that require explicit whitelisting via allowWithoutAttributes() per OWASP
+    private static readonly TAGS_REQUIRING_EXPLICIT_WHITELIST = ["a", "font", "img", "input", "span"];
+
     private globalAttributesAllowed: Set<AttributeType> = new Set();

Then update line 200:

-                if (
-                    ["a", "font", "img", "input", "span"].includes(tag) &&
-                    !this.tagsAllowedWithNoAttribute.has(tag)
-                ) {
+                if (
+                    HtmlPolicyBuilder.TAGS_REQUIRING_EXPLICIT_WHITELIST.includes(tag) &&
+                    !this.tagsAllowedWithNoAttribute.has(tag)
+                ) {

198-202: Good fix; recommend refactoring tag comparisons for consistency.

The logic at lines 198-202 correctly restricts removal to the five tags requiring explicit whitelisting per OWASP documentation, allowing <br> and other tags to pass through. The code properly converts tag names to lowercase before comparison, which aligns with the pattern at line 137 where tagName.toLowerCase() is used to query the tagsAllowed map.

However, lines 207 and 221 use uppercase comparisons (currentNode.tagName === "A"), creating inconsistency. For consistency across the hook, refactor lines 207 and 221 to also use lowercase comparisons:

if (currentNode.tagName.toLowerCase() === "a" || currentNode.tagName.toLowerCase() === "img") {

and

if (currentNode.tagName.toLowerCase() === "a") {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b081412 and 8d3a9a5.

📒 Files selected for processing (1)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts (1 hunks)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (2)

198-203: Good fix—consider extracting the tag list to a constant.

The logic correctly restricts removal to the specific tags mentioned in the OWASP documentation, allowing <br> and other tags to pass through. Lowercasing the tag name is essential since currentNode.tagName returns uppercase values in the DOM.

Consider extracting the hardcoded array to a named constant for better maintainability:

+    // Tags that require explicit whitelisting per OWASP Java HTML Sanitizer
+    private static readonly TAGS_REQUIRING_EXPLICIT_WHITELIST = ["a", "font", "img", "input", "span"];
+
     private setupHooks(): void {

Then update line 201:

-                ["a", "font", "img", "input", "span"].includes(tag) &&
+                HtmlPolicyBuilder.TAGS_REQUIRING_EXPLICIT_WHITELIST.includes(tag) &&

207-220: Consider standardizing tag name comparisons for consistency.

Lines 208 and 222 use uppercase comparisons (currentNode.tagName === "A" and currentNode.tagName === "IMG"), while line 198 now uses lowercase. This inconsistency can make the code harder to maintain.

Consider standardizing on lowercase comparisons throughout:

             } else {
                 //in case of <a> or <img> if we have no attribute we need to remove them even if they have child
-                if (currentNode.tagName === "A" || currentNode.tagName === "IMG") {
+                const tag = currentNode.tagName.toLowerCase();
+                if (tag === "a" || tag === "img") {
                     if (currentNode.attributes.length == 0) {
                         //add currentNode children to parent node
                         while (currentNode.firstChild) {
                             currentNode?.parentNode?.insertBefore(
                                 currentNode.firstChild,
                                 currentNode
                             );
                         }
                         // Remove the currentNode itself
                         currentNode.remove();
                     }
                 }
                 //
-                if (currentNode.tagName === "A") {
+                if (tag === "a") {
                     if (this.enforceRelNofollow) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d3a9a5 and 2c82ebf.

📒 Files selected for processing (2)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts (1 hunks)
  • test/kcSanitize/KcSanitizer.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/kcSanitize/KcSanitizer.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test with Node v18 on ubuntu-latest
🔇 Additional comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)

198-203: No action required—
is already properly configured.

The verification confirms that <br> is already included in the allowElements() call in KcSanitizerPolicy.ts (line 268), and there are existing tests that verify it's handled correctly (KcSanitizer.spec.ts lines 121-122). The code change doesn't introduce any issues with <br> handling, as the tag's restriction only applies to the specific tags in the list ("a", "font", "img", "input", "span"), which excludes <br>.

@garronej
Copy link
Copy Markdown
Collaborator

@coderabbitai This seems like a faily straight forward and wellcome addition.
Make the argument for and against merging it. Conclude with your recomendation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 21, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the diff for the PR to see the actual changes
gh pr diff 940

# Get the full content of the modified files to understand context
cat src/lib/kcSanitize/HtmlPolicyBuilder.ts

# Get the test file to see the test coverage
cat test/kcSanitize/KcSanitizer.spec.ts

Length of output: 16784


@garronej, here's my analysis of PR #940:

Arguments FOR merging:

  1. Follows OWASP best practices: The change correctly implements OWASP Java HTML Sanitizer guidance, which specifically identifies "a", "font", "img", "input", and "span" as tags requiring explicit whitelisting via allowWithoutAttributes() when they lack attributes.

  2. Fixes legitimate bug: The <br> tag is a standard, harmless HTML element that shouldn't require attributes. The previous logic incorrectly removed it.

  3. More secure architecture: By limiting removal checks to only the problematic tags identified by OWASP, the code follows a principle of "restrict only what's dangerous" rather than "allow only what's explicitly permitted" - this is safer for tags already filtered by DOMPurify's ALLOWED_TAGS.

  4. Better defense-in-depth: The case-insensitive handling (lowercasing tag names) prevents bypass attempts using mixed-case tags.

  5. Test coverage included: The PR adds verification that <br> tags are preserved.

Arguments AGAINST merging:

  1. Case-sensitivity bug: There's an inconsistency at Line 202-207. The code lowercases currentNode.tagName for the array check but tagsAllowedWithNoAttribute.has(tag) checks against a set that may contain mixed-case entries since allowWithoutAttributes() (Line 39) doesn't lowercase. This could cause allowWithoutAttributes("BR") to fail matching against lowercase "br".

  2. Limited test coverage: No tests verify other self-closing tags (like <hr>, <wbr>) or confirm the five problematic tags are still properly handled.

  3. Minor breaking change: If any code relied on the old "remove by default" behavior for empty tags, this changes that. However, this seems more like a bug fix than a breaking change.

Recommendation:

✅ Merge with one fix: Add .toLowerCase() to the allowWithoutAttributes() method:

allowWithoutAttributes(tag: string): this {
    this.tagsAllowedWithNoAttribute.add(tag.toLowerCase());
    return this;
}

This ensures consistency between how tags are stored and how they're checked. The change is well-motivated, security-sound, and addresses a real issue. The OWASP documentation provides strong justification for this approach.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)

36-39: Use toLowerCase() for consistency.

Line 37 uses toLocaleLowerCase(), but line 198 uses toLowerCase(). For HTML tag names (ASCII-only), both produce identical results, but toLowerCase() is more commonly used and creates consistency with the rest of the file.

Apply this diff:

 allowWithoutAttributes(tag: string): this {
-    this.tagsAllowedWithNoAttribute.add(tag.toLocaleLowerCase());
+    this.tagsAllowedWithNoAttribute.add(tag.toLowerCase());
     return this;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c82ebf and 9a4179b.

📒 Files selected for processing (2)
  • src/lib/kcSanitize/HtmlPolicyBuilder.ts (2 hunks)
  • test/kcSanitize/KcSanitizer.spec.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/kcSanitize/KcSanitizer.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test with Node v18 on ubuntu-latest
🔇 Additional comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)

192-205: LGTM! The OWASP-based filtering logic is correctly implemented.

The logic correctly implements the OWASP guidance: tags in the dangerous-without-attributes list are removed unless explicitly whitelisted via allowWithoutAttributes(), while safe self-closing tags like <br> and <hr> are preserved by default.

The case-sensitivity issue identified in the PR discussion has been properly addressed by lowercasing the tag name before comparison.

@garronej garronej merged commit f4a1013 into keycloakify:main Oct 21, 2025
7 checks passed
@garronej
Copy link
Copy Markdown
Collaborator

Than you @rgrosjean!

@allcontributors add @rgrosjean for code and test

@allcontributors
Copy link
Copy Markdown
Contributor

@garronej

I've put up a pull request to add @rgrosjean! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants