Conversation
WalkthroughafterSanitizeAttributes now lowercases tag names and only removes empty nodes when the lowercased tag is one of ["a","font","img","input","span"] and not present in tagsAllowedWithNoAttribute. Tests were added to verify handling and removal of non-whitelisted tags without attributes. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Sanitizer
participant Policy as HtmlPolicyBuilder.afterSanitizeAttributes
participant Allowed as tagsAllowedWithNoAttribute
participant Fixed as FixedTagSet
Note over Fixed: FixedTagSet = ["a","font","img","input","span"]
Caller->>Policy: afterSanitizeAttributes(node, tagName, attrs)
Policy->>Policy: lowerTag = tagName.toLowerCase()
Policy->>Fixed: is lowerTag in FixedTagSet?
alt lowerTag in FixedTagSet
Policy->>Allowed: is lowerTag in tagsAllowedWithNoAttribute?
alt NOT in Allowed
Policy-->>Caller: remove node (no attrs & no children)
else in Allowed
Policy-->>Caller: keep node
end
else not in FixedTagSet
Policy-->>Caller: keep node
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
hi @garronej I based this change on the OWASP Java HTML Sanitizer documentation:
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)
198-206: Document the rationale for the hardcoded tag list.The change correctly allows
<br>tags by restricting removal checks to a specific set of tags. However, the hardcoded list["A", "FONT", "IMG", "INPUT", "SPAN"]lacks explanation for why these particular tags require special handling when empty.Consider adding a comment explaining why these five tags specifically need to be removed when they have no attributes and no children, while other tags (like
<br>) should be preserved. This will help future maintainers understand the business logic.if ( currentNode.attributes.length == 0 && currentNode.childNodes.length == 0 ) { + // Only remove specific tags when empty and not explicitly allowed. + // A, IMG: meaningless without href/src attributes + // FONT, SPAN: styling tags with no effect when empty + // INPUT: form element that requires attributes to be functional if (["A", "FONT", "IMG", "INPUT", "SPAN"].includes(currentNode.tagName)) { if ( !this.tagsAllowedWithNoAttribute.has( currentNode.tagName.toLowerCase() ) ) { currentNode.remove(); } } }test/kcSanitize/KcSanitizer.spec.ts (1)
121-122: Good test coverage for the basic case.The test correctly verifies that
<br>tags are preserved through sanitization along with adjacent text.For more comprehensive coverage, consider adding tests for:
<br>tag alone:assertResult("<br>", "<br>");- Self-closing syntax:
assertResult("<br>", "<br/>");- Multiple breaks:
assertResult("Line1<br><br>Line2", "Line1<br><br>Line2");However, the current test is sufficient to validate the PR objective.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts(1 hunks)test/kcSanitize/KcSanitizer.spec.ts(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test with Node v18 on ubuntu-latest
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (2)
200-200: Consider extracting the hardcoded list as a constant.The array
["a", "font", "img", "input", "span"]represents tags requiring explicit whitelisting per OWASP. Extracting this as a named constant (e.g.,TAGS_REQUIRING_EXPLICIT_WHITELIST) would improve maintainability and make the intent clearer.Apply this diff:
+ // Tags that require explicit whitelisting via allowWithoutAttributes() per OWASP + private static readonly TAGS_REQUIRING_EXPLICIT_WHITELIST = ["a", "font", "img", "input", "span"]; + private globalAttributesAllowed: Set<AttributeType> = new Set();Then update line 200:
- if ( - ["a", "font", "img", "input", "span"].includes(tag) && - !this.tagsAllowedWithNoAttribute.has(tag) - ) { + if ( + HtmlPolicyBuilder.TAGS_REQUIRING_EXPLICIT_WHITELIST.includes(tag) && + !this.tagsAllowedWithNoAttribute.has(tag) + ) {
198-202: Good fix; recommend refactoring tag comparisons for consistency.The logic at lines 198-202 correctly restricts removal to the five tags requiring explicit whitelisting per OWASP documentation, allowing
<br>and other tags to pass through. The code properly converts tag names to lowercase before comparison, which aligns with the pattern at line 137 wheretagName.toLowerCase()is used to query thetagsAllowedmap.However, lines 207 and 221 use uppercase comparisons (
currentNode.tagName === "A"), creating inconsistency. For consistency across the hook, refactor lines 207 and 221 to also use lowercase comparisons:if (currentNode.tagName.toLowerCase() === "a" || currentNode.tagName.toLowerCase() === "img") {and
if (currentNode.tagName.toLowerCase() === "a") {
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (2)
198-203: Good fix—consider extracting the tag list to a constant.The logic correctly restricts removal to the specific tags mentioned in the OWASP documentation, allowing
<br>and other tags to pass through. Lowercasing the tag name is essential sincecurrentNode.tagNamereturns uppercase values in the DOM.Consider extracting the hardcoded array to a named constant for better maintainability:
+ // Tags that require explicit whitelisting per OWASP Java HTML Sanitizer + private static readonly TAGS_REQUIRING_EXPLICIT_WHITELIST = ["a", "font", "img", "input", "span"]; + private setupHooks(): void {Then update line 201:
- ["a", "font", "img", "input", "span"].includes(tag) && + HtmlPolicyBuilder.TAGS_REQUIRING_EXPLICIT_WHITELIST.includes(tag) &&
207-220: Consider standardizing tag name comparisons for consistency.Lines 208 and 222 use uppercase comparisons (
currentNode.tagName === "A"andcurrentNode.tagName === "IMG"), while line 198 now uses lowercase. This inconsistency can make the code harder to maintain.Consider standardizing on lowercase comparisons throughout:
} else { //in case of <a> or <img> if we have no attribute we need to remove them even if they have child - if (currentNode.tagName === "A" || currentNode.tagName === "IMG") { + const tag = currentNode.tagName.toLowerCase(); + if (tag === "a" || tag === "img") { if (currentNode.attributes.length == 0) { //add currentNode children to parent node while (currentNode.firstChild) { currentNode?.parentNode?.insertBefore( currentNode.firstChild, currentNode ); } // Remove the currentNode itself currentNode.remove(); } } // - if (currentNode.tagName === "A") { + if (tag === "a") { if (this.enforceRelNofollow) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts(1 hunks)test/kcSanitize/KcSanitizer.spec.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/kcSanitize/KcSanitizer.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test with Node v18 on ubuntu-latest
🔇 Additional comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)
198-203: No action required—
is already properly configured.The verification confirms that
<br>is already included in theallowElements()call inKcSanitizerPolicy.ts(line 268), and there are existing tests that verify it's handled correctly (KcSanitizer.spec.tslines 121-122). The code change doesn't introduce any issues with<br>handling, as the tag's restriction only applies to the specific tags in the list ("a", "font", "img", "input", "span"), which excludes<br>.
|
@coderabbitai This seems like a faily straight forward and wellcome addition. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 16784 @garronej, here's my analysis of PR #940: Arguments FOR merging:
Arguments AGAINST merging:
Recommendation:✅ Merge with one fix: Add allowWithoutAttributes(tag: string): this {
this.tagsAllowedWithNoAttribute.add(tag.toLowerCase());
return this;
}This ensures consistency between how tags are stored and how they're checked. The change is well-motivated, security-sound, and addresses a real issue. The OWASP documentation provides strong justification for this approach. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)
36-39: UsetoLowerCase()for consistency.Line 37 uses
toLocaleLowerCase(), but line 198 usestoLowerCase(). For HTML tag names (ASCII-only), both produce identical results, buttoLowerCase()is more commonly used and creates consistency with the rest of the file.Apply this diff:
allowWithoutAttributes(tag: string): this { - this.tagsAllowedWithNoAttribute.add(tag.toLocaleLowerCase()); + this.tagsAllowedWithNoAttribute.add(tag.toLowerCase()); return this; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib/kcSanitize/HtmlPolicyBuilder.ts(2 hunks)test/kcSanitize/KcSanitizer.spec.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/kcSanitize/KcSanitizer.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test with Node v18 on ubuntu-latest
🔇 Additional comments (1)
src/lib/kcSanitize/HtmlPolicyBuilder.ts (1)
192-205: LGTM! The OWASP-based filtering logic is correctly implemented.The logic correctly implements the OWASP guidance: tags in the dangerous-without-attributes list are removed unless explicitly whitelisted via
allowWithoutAttributes(), while safe self-closing tags like<br>and<hr>are preserved by default.The case-sensitivity issue identified in the PR discussion has been properly addressed by lowercasing the tag name before comparison.
|
Than you @rgrosjean! @allcontributors add @rgrosjean for code and test |
|
I've put up a pull request to add @rgrosjean! 🎉 |
Summary by CodeRabbit
Bug Fixes
Tests