Skip to content

Improve docs SEO metadata, crawler routes, and freshness signals#2397

Closed
omarrrns wants to merge 5 commits intomainfrom
docs/seo-improvements
Closed

Improve docs SEO metadata, crawler routes, and freshness signals#2397
omarrrns wants to merge 5 commits intomainfrom
docs/seo-improvements

Conversation

@omarrrns
Copy link
Copy Markdown
Contributor

@omarrrns omarrrns commented Feb 26, 2026

Summary

  • Add structured JSON-LD schema markup for docs pages (Article, BreadcrumbList, WebSite)
  • Improve LLM-readable routes (llms.txt, llms.mdx, llms-full.txt) with richer metadata
  • Add robots.ts, manifest.ts, and sitemap.ts improvements for better crawlability
  • Add freshness tracking lib (freshness.ts) and LLM metadata helpers (llm-metadata.ts)
  • Add SEO validation script (validate-seo.ts) and smoke test (smoke-seo.ts) for build-time checks
  • Add OG image prewarm script (prewarm-og.ts)

Test plan

  • Run pnpm build in agents-docs to verify no build errors
  • Run pnpm validate-seo to confirm SEO validation passes
  • Verify structured data via Google Rich Results Test on a deployed preview

🤖 Generated with Claude Code

Standardize freshness parsing and propagate canonical date metadata through sitemap, JSON-LD, and LLM routes so crawlers and AI indexers receive consistent signals. Add deterministic schema policy selection and smoke assertions while keeping freshness pair validation non-blocking.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 27, 2026 10:51pm
agents-docs Ready Ready Preview, Comment Feb 27, 2026 10:51pm
agents-manage-ui Ready Ready Preview, Comment Feb 27, 2026 10:51pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 26, 2026

⚠️ No Changeset found

Latest commit: fe3a2a8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Feb 26, 2026

Enhances documentation site SEO through comprehensive metadata improvements, structured data markup, and crawler-optimized routes.

Key changes:

  • JSON-LD structured data (src/components/seo/page-json-ld.tsx) — Auto-generates breadcrumbs, technical article schema, and HowTo markup from page TOC
  • Freshness signals (src/lib/freshness.ts, src/lib/metadata.ts) — Extracts and validates datePublished/dateModified frontmatter for better indexing
  • LLM crawler routes (src/app/llms-full.txt/route.ts, src/app/llms.mdx/[[...slug]]/route.ts, src/app/llms.txt/route.ts) — Enriched with metadata headers, section maps, and freshness dates
  • SEO validation tooling (scripts/validate-seo.ts, scripts/smoke-seo.ts) — Enforces description length, image alt text, duplicate detection, and canonical URL requirements
  • OG image prewarming (scripts/prewarm-og.ts) — Post-deploy script to populate CDN cache for social sharing
  • Robots/sitemap/manifest (src/app/robots.ts, src/app/sitemap.ts, src/app/manifest.ts) — Configures crawler access, URL discovery, and PWA metadata
  • Schema policy engine (src/lib/schema-policy.ts) — Detects intent and selects appropriate structured data schema per page

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid SEO foundation — structured data, freshness tracking, LLM routes, and validation scripts are well-architected. Found a few bugs that should be addressed before merge: an XSS vector in JSON-LD rendering, incorrect boolean logic in freshness.ts, a robots.txt pattern issue, and sitemap.ts emitting volatile lastmod values.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

}
}

return <JsonLd json={jsonLdPayload} />;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XSS risk in JsonLd component: The json-ld.tsx component called here uses dangerouslySetInnerHTML={{ __html: JSON.stringify(json) }}. JSON.stringify does not escape </script> sequences. If any frontmatter field (title, description) contains the literal string </script>, the browser will close the script tag early and interpret the remainder as HTML.

In json-ld.tsx, post-process the output:

JSON.stringify(json).replace(/</g, '\\u003c')

This is the standard OWASP mitigation for inline JSON in <script> tags. Low likelihood with MDX-authored content, but defense-in-depth matters since frontmatter strings flow directly into the output.

Comment thread agents-docs/src/lib/freshness.ts Outdated
const hasDatePublished = Boolean(datePublished && datePublished.trim());
const hasDateModified = Boolean(dateModified && dateModified.trim());

const hasDatePair = hasDatePublished === hasDateModified;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: hasDatePair is hasDatePublished === hasDateModified, which evaluates to true when both are present or both are absent. The field name implies "both dates are present" — this should likely be hasDatePublished && hasDateModified.

Comment thread agents-docs/src/lib/freshness.ts Outdated

const hasDatePair = hasDatePublished === hasDateModified;
const hasDateValues = hasDatePublished || hasDateModified;
const hasInvalidDate = hasDateValues && (!parsedPublished || !parsedModified);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: hasInvalidDate fires a false positive when only one date is provided.

Example: parseFreshnessMetadata('2025-01-01', undefined)hasDateValues is true, parsedModified is undefinedhasInvalidDate is true. But datePublished is perfectly valid — only dateModified was omitted.

The check should be scoped to dates that were actually provided:

const hasInvalidDate =
  (hasDatePublished && !parsedPublished) ||
  (hasDateModified && !parsedModified);

Comment thread agents-docs/src/app/robots.ts Outdated
{
userAgent: '*',
allow: '/',
disallow: ['/api/', '/*.md$', '/*.mdx$'],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $ in /*.md$ and /*.mdx$ is only recognized as an end-of-pattern anchor by Googlebot (it's a Google extension to the robots.txt spec). Other crawlers treat it as a literal $ character, so these patterns won't block .md/.mdx paths for non-Google bots.

Verify the actual generated /robots.txt output. If the .md/.mdx blocking isn't critical (these paths likely 404 on a Next.js site anyway), simplifying to just '/api/' avoids the ambiguity.

Comment thread agents-docs/src/app/sitemap.ts Outdated
const depth = segments.length;
const isOverviewPage = page.url === '/overview';
const freshness = parseFreshnessMetadata(page.data.datePublished, page.data.dateModified);
const lastModified = freshness.lastModified ? new Date(freshness.lastModified) : new Date();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When freshness.lastModified is falsy, this falls back to new Date(), so every build produces a different lastmod timestamp for every page without date metadata. This defeats the purpose of lastmod as a freshness signal — search engines will think every page changed on every build, causing unnecessary re-crawling.

Omit lastModified when no date is available:

const lastModified = freshness.lastModified ? new Date(freshness.lastModified) : undefined;

Comment on lines +81 to +98
function matchesRoutePattern(url: string, pattern: string) {
if (pattern === '/**') {
return true;
}

if (pattern.startsWith('/**/')) {
const suffix = pattern.slice(3);
return url.endsWith(suffix);
}

if (pattern.endsWith('/**')) {
const prefix = pattern.slice(0, -3);
const exact = prefix.endsWith('/') ? prefix.slice(0, -1) : prefix;
return url === exact || url.startsWith(prefix);
}

return url === pattern;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latent bug in the endsWith('/**') branch: when pattern is /api-reference/**, prefix is /api-reference and the check is url.startsWith('/api-reference'). This would incorrectly match /api-reference-v2/foo.

Fix:

return url === exact || url.startsWith(`${exact}/`);

Not triggered by current patterns but will bite when a prefix/** rule is added.

Image: (props) => (
<ImageZoom
alt={props.alt ?? 'Image'}
alt={props.alt ?? ''}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the alt fallback from 'Image' to '' tells screen readers these are decorative images to skip entirely. If these are content-relevant images (diagrams, screenshots), empty alt is worse for accessibility than generic alt — screen reader users won't know the image exists.

This is driven by the new image-alt-generic validation rule flagging alt="Image". Consider keeping a descriptive default (e.g. 'Documentation image') and progressively adding proper alt text to individual images in MDX content.

Comment on lines +150 to +154
if (searchRouteResponse.ok) {
assert(
html.includes('SearchAction'),
'SearchAction expected in site JSON-LD when an indexable /search route exists.'
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability: html here refers to the /overview response (fetched on line 33), but the check is gated on searchRouteResponse.ok. If the intent is to verify the homepage JSON-LD includes SearchAction when a /search route exists, add a clarifying comment — it currently reads as if the wrong variable is being used.

Comment thread agents-docs/src/lib/llm-metadata.ts Outdated
Comment on lines +33 to +39
function normalizeAnchor(url: string) {
if (url.includes('#')) {
return url.slice(url.indexOf('#'));
}

return url.startsWith('#') ? url : `#${url}`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url.startsWith('#') branch (line 38) is dead code. If url is "#section", url.includes('#') on line 34 is already true and returns from line 35. The fallback on line 38 can only be reached when url does not contain #, so startsWith('#') is always false.

Simplified:

function normalizeAnchor(url: string) {
  if (url.includes('#')) {
    return url.slice(url.indexOf('#'));
  }
  return `#${url}`;
}

Comment thread agents-docs/src/app/llms.txt/route.ts Outdated
dateModified: freshness.dateModified?.value,
});
const freshnessLabel = freshness.lastModified ?? 'missing';
return `- ${metadata.replace('<!-- ', '').replace(' -->', '')}\n - [${page.data.title}](${BASE_URL}${page.url})\n - fresh=${freshnessLabel}\n - sections=${sectionCount}`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: .replace('<!-- ', '').replace(' -->', '') strips the comment wrapper from buildLLMMetadataHeader output — implicit coupling to the comment format. Consider having buildLLMMetadataHeader accept a wrap option or exporting a separate helper that returns the raw metadata string, instead of building the comment and immediately stripping it.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(8) Total Issues | Risk: Medium

🟠⚠️ Major (3) 🟠⚠️

🟠 1) page.tsx:127 OG image URL malformed for root path

Issue: When params.slug is undefined (catch-all route root), params.slug?.join('/') returns undefined, producing the URL /api/docs-og/undefined/image.png.

Why: The catch-all route [[...slug]] receives undefined for the root path /. This breaks Open Graph image previews when sharing the root URL on social media.

Fix: Add a fallback to 'overview' since root redirects there:

url: `/api/docs-og/${params.slug?.join('/') || 'overview'}/image.png`,

Refs:


🟠 2) freshness.ts, schema-policy.ts Critical utility functions lack unit tests

Issue: The new freshness.ts and schema-policy.ts modules contain date parsing and URL pattern matching logic with multiple edge cases but no unit tests.

Why: These functions drive SEO metadata across sitemap, JSON-LD, and LLM routes. Untested edge cases include:

  • Invalid dates that pass regex but fail Date parsing (e.g., '2024-02-30')
  • URL pattern matching for suffix patterns like /**/overview
  • hasDatePair semantics (true when both present OR both absent)

Fix: Add unit tests covering:

describe('parseFreshnessMetadata', () => {
  it('should detect chronologically invalid dates', () => {
    const result = parseFreshnessMetadata('2024-06-15', '2024-01-01');
    expect(result.isChronologicallyValid).toBe(false);
  });
});

describe('resolveSchemaPolicy', () => {
  it('should match /**/overview suffix pattern', () => {
    expect(resolveSchemaPolicy({ url: '/guides/agent/overview' }).ruleId).toBe('hub-collection-page');
  });
});

Refs:


🟠 3) page-json-ld.tsx:223-242 OfferCatalog references potentially non-existent URLs

Issue: The productLd and offerCatalogLd schemas reference /pricing and /get-started/quick-start URLs, but these pages may not exist in the docs site.

Why: Invalid URLs in structured data can cause Google to ignore the schema or flag validation errors in Search Console.

Fix: Either verify these URLs exist, or conditionally include the OfferCatalog only when a /pricing page is present in the sitemap.

Refs:

Inline Comments:

  • 🟠 Major: page.tsx:127 OG image URL fallback needed

🟡 Minor (5) 🟡

🟡 1) robots.ts:12 robots.txt patterns use unsupported regex syntax

Issue: Patterns /*.md$ and /*.mdx$ use regex $ anchors, but robots.txt uses glob matching.

Why: The $ is treated literally, so this won't block .md/.mdx files as intended.

Fix: Remove the $ anchors: disallow: ['/api/', '/*.md', '/*.mdx']

Refs:


🟡 2) Multiple files BASE_URL constant duplicated across 8 files

Issue: const BASE_URL = 'https://docs.inkeep.com' is defined inline in 8 different files instead of a shared constant.

Why: Increases maintenance burden and risk of drift. The codebase already has constants.ts for shared values.

Fix: Export from src/lib/constants.ts and import across all files.

Refs:


🟡 3) llm-metadata.ts, page-json-ld.tsx Duplicate TocEntry/TocItem interfaces and normalizeTitle functions

Issue: Nearly identical TocEntry/TocItem interfaces and normalizeTitle/normalizeTocTitle functions exist in both files.

Why: Code duplication that could drift over time.

Fix: Export shared types and utilities from llm-metadata.ts and import in page-json-ld.tsx.

Refs:


🟡 4) llms.txt/route.ts:24 Unnecessary Promise.all on synchronous array

Issue: The map() callback is synchronous, so scan is already resolved strings, not Promises.

Why: Adds unnecessary complexity.

Fix: const scanned = scan;


🟡 5) prewarm-og.ts:59-64 Missing fetch timeout could cause indefinite hangs

Issue: No timeout on fetch requests; OG image generation can be slow.

Why: Script could wait indefinitely if routes hang.

Fix: Add AbortController with 30s timeout.

Inline Comments:

  • 🟡 Minor: robots.ts:12 Remove unsupported $ anchors
  • 🟡 Minor: llms.txt/route.ts:24 Remove unnecessary Promise.all
  • 🟡 Minor: prewarm-og.ts:59-64 Add fetch timeout

💭 Consider (4) 💭

💭 1) prewarm-og.ts:54-57 Shared mutable counters in concurrent workers
Issue: successCount and failureCount are incremented after async operations, potentially losing counts if workers complete simultaneously.
Why: Unlikely in practice due to JS single-threading, but violates concurrent code best practices.

💭 2) freshness.ts:51 hasDatePair naming is semantically misleading
Issue: Returns true when both dates are present OR both absent (equality check), not just when both are present.
Why: Could confuse future maintainers expecting it to mean "both dates exist."

💭 3) page-json-ld.tsx:117-282 JSON-LD schemas created unconditionally
Issue: All 5 schema objects are constructed on every render, but only 1-2 are used per page.
Why: Minor memory overhead, low impact for SSG.

💭 4) LLM routes Consider adding content author documentation
Issue: New frontmatter fields (datePublished, dateModified) and LLM routes lack contributor documentation.
Why: Content authors may not know how to use freshness metadata.


💡 APPROVE WITH SUGGESTIONS

Summary: This is a solid SEO infrastructure improvement that adds JSON-LD structured data, freshness signals, and LLM-readable routes. The main concerns are: (1) a potential OG image bug for root paths, (2) missing unit tests for critical date parsing and URL matching logic, and (3) some code duplication that could be consolidated. The robots.txt regex syntax issue should be fixed to ensure proper crawler blocking. None of these are blocking, but addressing the OG image fallback and adding tests would increase confidence in the SEO infrastructure.

Discarded (12)
Location Issue Reason Discarded
smoke-seo.ts:86 URL /overview.mdx route mismatch Addressed by next.config.ts rewrites that map /:path*.mdx/llms.mdx/:path*
scripts/* Script entry point pattern inconsistency Both void main() and .catch() patterns exist in codebase; stylistic preference
llms.mdx/route.ts:8-9 Route handler ordering inconsistency Very minor stylistic difference
llms.mdx/route.ts:29-48 Response header construction differs Justified by conditional header logic
page-json-ld.tsx:77-98 flattenTocItems untested edge cases Low confidence, edge case unlikely with well-formed TOC
freshness.ts:9-19 FreshnessPairResult boolean flags complexity Design observation, not a bug; flags are correctly computed
llm-metadata.ts:17-22 TocEntry uses title?: unknown Intentional for handling untrusted fumadocs data
page-json-ld.tsx:25-30 TocItem duplicates TocEntry Type duplication noted under Minor #3
schema-policy.ts:115-118 Matrix fallback assumes non-empty Matrix is compile-time constant with guaranteed catch-all
smoke-seo/prewarm-og Not integrated into CI/CD Intentionally manual post-deploy tasks per SEO_REMEDIATION_STATUS.md
smoke-seo.ts:3 Network dependency for runtime tests Expected design for post-deploy validation
page.tsx:17-45 getBreadcrumbItems efficiency Acceptable for static generation; O(n) with small n
Reviewers (8)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 3 1 1 0 1 0 0
pr-review-product 5 1 2 0 0 0 2
pr-review-consistency 7 2 0 0 0 0 5
pr-review-docs 0 0 0 0 0 0 0
pr-review-tests 5 1 0 0 0 0 4
pr-review-types 4 0 0 0 0 0 4
pr-review-devops 8 0 0 0 1 0 7
pr-review-frontend 5 0 1 0 2 0 2
Total 37 5 4 0 4 0 24

Comment thread agents-docs/src/app/robots.ts Outdated
{
userAgent: '*',
allow: '/',
disallow: ['/api/', '/*.md$', '/*.mdx$'],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: robots.txt doesn't support regex $ anchor

Issue: The patterns /*.md$ and /*.mdx$ use regex-style $ anchors, but standard robots.txt uses glob-style matching, not regex.

Why: The $ character is treated literally, so this pattern would only match paths literally ending in d$ or x$, not paths ending in .md or .mdx.

Fix:

Suggested change
disallow: ['/api/', '/*.md$', '/*.mdx$'],
disallow: ['/api/', '/*.md', '/*.mdx'],

Refs:

Comment thread agents-docs/src/app/llms.txt/route.ts Outdated
const freshnessLabel = freshness.lastModified ?? 'missing';
return `- ${metadata.replace('<!-- ', '').replace(' -->', '')}\n - [${page.data.title}](${BASE_URL}${page.url})\n - fresh=${freshnessLabel}\n - sections=${sectionCount}`;
});
const scanned = await Promise.all(scan);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Unnecessary Promise.all() on synchronous array

Issue: The map() callback at line 10 is synchronous (returns a string directly), so scan is already an array of strings, not Promises.

Why: Wrapping a non-Promise array in Promise.all() works but adds unnecessary complexity and could confuse future maintainers.

Fix:

Suggested change
const scanned = await Promise.all(scan);
const scanned = scan;

Comment on lines +59 to +64
try {
const response = await fetch(target, {
headers: {
Accept: 'image/png',
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Missing fetch timeout could cause script to hang

Issue: The fetch requests have no explicit timeout. OG image generation can be slow, and Node.js fetch has no default timeout.

Why: If the Vercel OG route hangs or is slow, the script could wait indefinitely, blocking post-deploy automation.

Fix: Consider adding an AbortController with a timeout:

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000);
try {
  const response = await fetch(target, {
    signal: controller.signal,
    headers: { Accept: 'image/png' },
  });
  // ... handle response
} finally {
  clearTimeout(timeout);
}

@github-actions github-actions Bot deleted a comment from claude Bot Feb 26, 2026
- Escape </script> in JSON-LD output to prevent XSS
- Fix OG image URL for root path (undefined slug fallback)
- Fix freshness.ts false positive when only one date is provided
- Omit sitemap lastModified instead of using volatile new Date()
- Remove non-standard $ anchors from robots.txt disallow rules
- Fix schema-policy prefix matching (/api-reference-v2 no longer matches /api-reference/**)
- Remove dead FAQ page code and Product/OfferCatalog schemas referencing non-existent /pricing
- Fix normalizeAnchor unreachable branch and unnecessary Promise.all
- Add 30s fetch timeout to prewarm-og.ts
- Consolidate BASE_URL into constants.ts, deduplicate TocEntry/normalizeTitle
- Fix validate-seo.ts broken import for node --experimental-strip-types
- Add unit tests for freshness.ts and schema-policy.ts (23 tests)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review scoped to changes since the last automated review. The commit fb48b473f ("Fix SEO review issues: XSS, dead code, bugs, and code quality") comprehensively addresses all prior review feedback.

✅ Prior Issues Addressed

Prior Issue Fix Applied Status
XSS in JSON-LD .replace(/</g, '\\u003c') in json-ld.tsx ✅ Fixed
hasInvalidDate false positive Changed to scope validation per field ✅ Fixed
hasDatePair naming Renamed to hasSymmetricDates ✅ Fixed
Robots.txt $ anchors Removed non-standard anchors ✅ Fixed
Sitemap volatile lastModified Changed new Date() to undefined ✅ Fixed
Schema-policy prefix matching Added path separator requirement ✅ Fixed
Dead code (Product/FAQ/OfferCatalog) Removed all unused schemas ✅ Fixed
OG image root path Added || 'overview' fallback ✅ Fixed
Prewarm timeout Added 30s AbortController ✅ Fixed
Missing unit tests Added 23 tests for freshness.ts and schema-policy.ts ✅ Fixed

💭 Consider (1) 💭

💭 1) freshness.test.ts Add test coverage for formatFreshnessDate export

Issue: The freshness.ts module exports two public functions: parseFreshnessMetadata (10 tests) and formatFreshnessDate (0 tests). The latter is used in schema validation and JSON-LD generation.

Why: While formatFreshnessDate is a thin wrapper over the tested parseDate internal, direct tests would catch regressions in return type or undefined-handling.

Fix: Optional addition:

describe('formatFreshnessDate', () => {
  it('returns ISO string for valid date', () => {
    expect(formatFreshnessDate('2024-01-15')).toBe(new Date('2024-01-15').toISOString());
  });
  it('returns undefined for invalid date', () => {
    expect(formatFreshnessDate('not-a-date')).toBeUndefined();
  });
});

Refs: freshness.ts:74-77


✅ APPROVE

Summary: All prior review feedback has been comprehensively addressed. The XSS fix is correctly implemented using the OWASP-recommended Unicode escape approach. The freshness.ts logic bugs have been fixed with proper field-scoped validation. The new test suite provides excellent coverage with 23 tests for the critical date parsing and URL pattern matching logic. The only suggestion is an optional test coverage enhancement for the formatFreshnessDate helper.

Discarded (4)
Location Issue Reason Discarded
freshness.test.ts Calendar-invalid dates (e.g., Feb 30) LOW confidence; JavaScript Date auto-corrects these, which is documented behavior
schema-policy.test.ts HowTo step heading pattern variations LOW confidence; regex is simple and unlikely to change
schema-policy.test.ts requiredFields property not tested LOW confidence; metadata for documentation, not runtime-critical
freshness.test.ts Timezone test doesn't verify actual values LOW confidence; standard Date parsing handles timezones correctly
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-tests 5 0 1 0 0 0 4
pr-review-frontend 1 0 0 0 0 0 1
Total 6 0 1 0 0 0 5

Note: pr-review-frontend finding was a positive confirmation of the XSS fix, not an issue.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 26, 2026
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 26, 2026

Ito Test Report ✅

30 test cases ran. 30 passed.

All SEO metadata, crawler routes, and freshness signal features from PR #2397 have been verified successfully. The homepage correctly redirects to /overview, all required SEO metadata (canonical, description, Open Graph, Twitter card, JSON-LD) is present, robots.txt properly configures crawler access, sitemap.xml includes correct priority and change frequency values, and LLM-readable routes (llms.txt, llms-full.txt, .mdx/.md rewrites) serve enhanced metadata with proper headers. Edge cases including breadcrumb deduplication, empty TOC handling, schema policy mapping, and security considerations (XSS prevention, header injection protection) all function as expected.

✅ Passed (30)
Test Case Summary Timestamp Screenshot
ROUTE-1 HTTP 307 redirect from / to /overview confirmed via curl and browser navigation 0:11 ROUTE-1_0-11.png
ROUTE-2 Canonical link, description, og:url, twitter:card=summary_large_image all present. JSON-LD contains BreadcrumbList, WebPage, and SoftwareApplication with applicationCategory=DeveloperApplication 0:27 ROUTE-2_0-27.png
ROUTE-3 robots.txt returns 200 with text/plain. Contains User-Agent * block with Allow:/, Disallow:/api/, Disallow:/.md, Disallow:/.mdx. GPTBot, OAI-SearchBot, ChatGPT-User blocks have Allow for /llms.txt, /llms-full.txt, /llms.mdx/ and Disallow /api/. Sitemap and Host directives present. 1:46 ROUTE-3_1-46.png
ROUTE-4 Manifest returns 200 with application/manifest+json. Contains name Inkeep Open Source Docs, short_name Inkeep Docs, start_url /overview, display standalone, and 3 icons (SVG, ICO, PNG). 3:17 ROUTE-4_3-17.png
ROUTE-5 Sitemap returns 200 with valid XML. /overview has priority 1.0 and changefreq daily. Top-level pages have 0.8/daily, depth-2 pages 0.6/weekly, depth-3+ pages 0.5/monthly. All loc values use absolute https://docs.inkeep.com URLs. 3:18 ROUTE-5_3-18.png
ROUTE-6 200 response with text/plain; charset=utf-8 content-type, x-content-type-options: nosniff header, body starts with # Inkeep, entries contain LLM_METADATA JSON, fresh=missing labels, sections= counts 4:18 ROUTE-6_4-18.png
ROUTE-7 200 response with text/plain; charset=utf-8 content-type, x-content-type-options: nosniff, body contains LLM_PAGE_START markers, LLM_METADATA blocks, and LLM_PAGE_END markers across 163+ pages. No Sections: line maps found (pages use sections=0 in index). 5:27 ROUTE-7_5-27.png
ROUTE-8 200 response with text/markdown; charset=utf-8, x-content-type-options: nosniff, Link canonical header pointing to https://docs.inkeep.com/overview, x-llm-canonical: https://docs.inkeep.com/overview, body starts with LLM_METADATA comment. No date headers present since no datePublished/dateModified frontmatter. 5:29 ROUTE-8_5-29.png
ROUTE-9 200 response with identical headers and body as .mdx version: text/markdown; charset=utf-8, Link canonical to https://docs.inkeep.com/overview, x-llm-canonical header, x-content-type-options: nosniff, body starts with LLM_METADATA comment. 5:29 ROUTE-9_5-29.png
ROUTE-10 OG image returns 200 with Content-Type image/png and Cache-Control: public, max-age=0, s-maxage=2592000, stale-while-revalidate=86400. 3:19 ROUTE-10_3-19.png
ROUTE-11 Navigated to /typescript-sdk/project-management. Page-level JSON-LD contains TechArticle with headline 'Project Management', description, url, publisher Organization, alongside BreadcrumbList and WebPage schemas. 7:21 ROUTE-11_7-21.png
ROUTE-12 Verified BreadcrumbList on nested overview page (/talk-to-your-agents/overview) with 3 items starting from /overview root, and on regular doc page (/typescript-sdk/project-management) with 3 items: Overview(1) -> Typescript Sdk(2) -> Project Management(3). All positions sequential from 1. 7:03 ROUTE-12_7-03.png
ROUTE-13 Layout JSON-LD verified with Organization (name=Inkeep, foundingDate=2023, sameAs links) and WebSite (name=Inkeep Open Source, url=https://docs.inkeep.com) 1:00 ROUTE-13_1-00.png
ROUTE-14 All icon links present: icon.svg (type=image/svg+xml), favicon.ico (sizes=any), apple-touch-icon.png (sizes=180x180), manifest.webmanifest 0:37 ROUTE-14_0-37.png
ROUTE-15 Canonical and og:url both use https://docs.inkeep.com/overview with correct single-slash format, no double-slash detected 0:37 ROUTE-15_0-37.png
EDGE-1 BreadcrumbList.itemListElement has exactly 1 item with name=Overview, position=1, item=https://docs.inkeep.com/overview. No duplicate entries. 1:01 EDGE-1_1-01.png
EDGE-2 Verified overview.mdx response and llms-full.txt overview section contain no Sections: or LLM_SECTIONS markers. Overview page in llms.txt index shows sections=0, confirming proper empty TOC handling. 5:32 EDGE-2_5-32.png
EDGE-3 Verified sitemap.xml contains zero lastmod tags. Since no content files have datePublished/dateModified frontmatter, lastmod is correctly omitted from all entries. Sitemap XML remains valid. 3:20 EDGE-3_3-20.png
EDGE-4 GET /nonexistent-page-xyz.mdx returns HTTP 404 Not Found. No crash, no stack trace. Proper error response. 5:31 EDGE-4_5-31.png
EDGE-5 Navigated to /talk-to-your-agents/overview. Page-level JSON-LD contains CollectionPage as primary schema type alongside BreadcrumbList and WebPage, confirming the /**/overview schema policy rule works correctly. 6:51 EDGE-5_6-51.png
EDGE-6 Verified on /typescript-sdk/project-management that HowTo schema is absent. Page has Step 1/2/3 headings but at h3 level nested under parent TOC entries, not as top-level TOC entries. JSON-LD types are Organization, WebSite, BreadcrumbList, WebPage, TechArticle only. 7:53 EDGE-6_7-53.png
EDGE-7 Verified in mdx-components.tsx that both Image (line 76) and img (line 87) components use alt={props.alt ?? ''}, defaulting to empty string. Checked rendered pages /visual-builder/tools/mcp-servers and /overview - no images have alt='Image'. All content images have explicit alt text. 9:33 EDGE-7_9-33.png
EDGE-8 Verified Zod schema z.union([z.string(), z.array(z.string())]).optional() in source.config.ts correctly accepts string format, array format, and undefined while rejecting invalid types. All 23 unit tests pass. Build succeeds validating all frontmatter. validate-seo passes. Existing content files use string-format keywords successfully. 10:57 EDGE-8_10-57.png
EDGE-9 All entries in llms.txt contain fresh=missing since no content files have datePublished/dateModified frontmatter. Confirmed across all 90+ entries. 5:33 EDGE-9_5-33.png
ADV-1 No literal < characters found in any JSON-LD script tag innerHTML content. Both layout-level and page-level JSON-LD blocks are properly escaped, preventing script injection. 1:02 ADV-1_1-02.png
ADV-2 All four user-agent blocks (*, GPTBot, OAI-SearchBot, ChatGPT-User) contain Disallow: /api/ directive. API routes are restricted from all crawlers. 1:47 ADV-2_1-47.png
ADV-3 Verified all 163 LLM_METADATA comments in llms-full.txt are syntactically valid HTML comments. No premature --> closers found inside JSON string values. Quick Start page with '<5min' in description is properly contained within the comment block. JSON.stringify properly escapes special characters. 5:35 ADV-3_5-35.png
ADV-4 OG image returns identical Cache-Control header (public, max-age=0, s-maxage=2592000, stale-while-revalidate=86400) for Accept: image/png, Accept: /, and with query parameter ?bust=123. 3:21 ADV-4_3-21.png
ADV-5 No X-LLM-Date-Published or X-LLM-Last-Modified headers are present in the overview.mdx response because no datePublished/dateModified frontmatter exists. When these headers are absent, header injection via date values is not possible. The header values (if present) would be validated as ISO 8601 strings only. 5:36 ADV-5_5-36.png
ADV-6 Wildcard user-agent has Disallow: /.md and Disallow: /.mdx. However, GPTBot, OAI-SearchBot, and ChatGPT-User each have their own user-agent blocks with explicit Allow: /llms.txt, Allow: /llms-full.txt, Allow: /llms.mdx/ directives. Per robots.txt spec, bot-specific rules take precedence over wildcard rules, so LLM routes remain accessible to LLM crawlers. 1:48 ADV-6_1-48.png
📋 View Recording

Screen Recording

The /llms.mdx/ per-page route was a custom convention not part of the
llms.txt spec. Remove it and its supporting libraries (llm-metadata.ts,
get-llm-text.ts), simplify /llms.txt and /llms-full.txt to clean
implementations, and update robots.ts, next.config.ts rewrites, smoke
tests, and page-json-ld.tsx accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review scoped to changes since the last automated review (commit fb48b473f). The latest commit 26937f8a9 ("Remove nonstandard /llms.mdx/ route, keep /llms.txt and /llms-full.txt") is a clean simplification with no new issues.

✅ Delta Changes Verified

Change Assessment
Removed /llms.mdx/[[...slug]]/route.ts ✅ Clean deletion — nonstandard route removed
Removed llm-metadata.ts (122 lines) ✅ Unused code removed; TocEntry/normalizeTitle inlined to page-json-ld.tsx
Removed get-llm-text.ts (12 lines) ✅ Logic inlined to consumers
Simplified llms.txt route ✅ Clean basic format; synchronous map correctly avoids Promise.all
Simplified llms-full.txt route ✅ Correctly uses Promise.all for async content fetching
Removed .mdx/.md rewrites from next.config.ts ✅ No longer needed without /llms.mdx/ route
Simplified robots.ts ✅ Removed LLM bot-specific rules; prior $ anchor issue resolved
Updated smoke tests ✅ Assertions simplified to match new route format

🕐 Pending Recommendations (2)

Minor items from prior reviews that remain unaddressed (low priority):

  • 🟡 smoke-seo.ts:117-121 Readability: html variable check is gated on searchRouteResponse.ok — consider adding a clarifying comment
  • 🟡 mdx-components.tsx:76 Empty alt fallback ('') marks images as decorative — consider 'Documentation image' for content-relevant images

✅ APPROVE

Summary: The delta commit is a well-executed simplification that removes the nonstandard /llms.mdx/ route system in favor of the simpler /llms.txt and /llms-full.txt endpoints. All code removals are clean, consumers are properly updated, and the simplified routes are correctly implemented. Combined with the prior review confirming all major issues were addressed, this PR is ready to merge. 🎉

Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-consistency 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Delta scope limited to 1 commit. Both reviewers confirmed the simplification is correct with no new issues.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 27, 2026
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 27, 2026

Ito Test Report ❌

31 test cases ran. 29 passed, 2 failed.

This PR improves SEO metadata, crawler routes, and freshness signals for the Inkeep Agents Docs. Verification confirms that JSON-LD structured data (Organization, WebSite, BreadcrumbList, WebPage, TechArticle, CollectionPage, SoftwareApplication) renders correctly across page types. The robots.txt, manifest.webmanifest, sitemap.xml, and LLM-readable routes (/llms.txt, /llms-full.txt) all function properly with correct headers and content. Two issues were identified: the HowTo schema fails to emit on pages with step headings, and the OG image route returns HTTP 500 for non-existent page slugs instead of a graceful 404.

✅ Passed (29)
Test Case Summary Timestamp Screenshot
ROUTE-10 HTTP 307 redirect from / to /overview confirmed 0:44 ROUTE-10_0-44.png
ROUTE-11 All HTML metadata tags verified: canonical, og:url, og:image, twitter:card, manifest, apple-touch-icon present 1:11 ROUTE-11_1-11.png
ROUTE-1 Two JSON-LD blocks confirmed: Layout (Organization+WebSite), Page (BreadcrumbList+WebPage+SoftwareApplication) 1:40 ROUTE-1_1-40.png
ROUTE-12 BreadcrumbList JSON-LD contains exactly 1 ListItem: position=1, name=Overview. No duplication detected. 1:54 ROUTE-12_1-54.png
EDGE-1 Both JSON-LD blocks on /overview contain zero unescaped < characters, preventing XSS injection. 2:08 EDGE-1_2-08.png
EDGE-6 No double-slash (//) found after domain in any URL on /overview page. 2:21 EDGE-6_2-21.png
ADV-4 Two separate JSON-LD script blocks confirmed on /overview: layout-level and page-level coexist without interference. 2:35 ADV-4_2-35.png
ROUTE-2 Confirmed /concepts page renders TechArticle JSON-LD with headline, description, url, mainEntityOfPage, publisher. 4:00 ROUTE-2_4-00.png
ROUTE-13 Confirmed /get-started/quick-start has BreadcrumbList with Overview as root entry. 6:03 ROUTE-13_6-03.png
EDGE-7 Verified content images use proper alt text, not default 'Image'. 6:05 EDGE-7_6-05.png
ROUTE-3 Confirmed /api-reference page loads with CollectionPage JSON-LD schema. 6:56 ROUTE-3_6-56.png
ADV-6 Confirmed schema policy assigns CollectionPage to /api-reference and pages ending in /overview suffix. 7:23 ADV-6_7-23.png
ROUTE-4 robots.txt returns HTTP 200 with User-agent: *, Allow: /, Disallow: /api/, Sitemap and Host directives. 8:35 ROUTE-4_8-35.png
ROUTE-5 manifest.webmanifest returns HTTP 200 with correct name, short_name, start_url, display, and 3 icons. 8:36 ROUTE-5_8-36.png
ROUTE-6 sitemap.xml returns HTTP 200 with valid XML, priority and changefreq based on depth. 8:36 ROUTE-6_8-36.png
EDGE-8 All 3 manifest icon files return HTTP 200 with correct Content-Type. 8:37 EDGE-8_8-37.png
EDGE-2 Confirmed zero lastmod entries in sitemap.xml. Pages without dates correctly omit lastmod. 9:11 EDGE-2_9-11.png
EDGE-3 Verified depth-based sitemap logic: /overview=priority 1, depth-1=0.8, depth-2=0.6, depth-3+=0.5. 9:14 EDGE-3_9-14.png
EDGE-9 URL normalization verified: /overview/, /overview?ref=test all show consistent schema. 9:14 EDGE-9_9-14.png
ADV-1 robots.txt contains Disallow: /api/ directive. 9:15 ADV-1_9-15.png
ROUTE-7 HTTP 200 with Content-Type text/plain, X-Content-Type-Options nosniff, body starts with # Inkeep. 10:40 ROUTE-7_10-40.png
ROUTE-8 HTTP 200 with 811707 bytes of content, proper headers, no truncation. 10:51 ROUTE-8_10-51.png
EDGE-5 All deleted routes (/llms.mdx, /llms.mdx/overview, /overview.mdx) properly return HTTP 404. 10:59 EDGE-5_10-59.png
ADV-2 Large payload (793KB) handled without failure, timeout, or truncation. 11:07 ADV-2_11-07.png
ROUTE-9 OG image route returns HTTP 200 with Cache-Control: public, max-age=0, s-maxage=2592000, stale-while-revalidate=86400. 11:55 ROUTE-9_11-55.png
ADV-5 /api/docs/fragments returns HTTP 200 with text/plain content containing documentation sections. 11:57 ADV-5_11-57.png
MANUAL-5 All 23 unit tests passed across 2 test files (freshness.test.ts and schema-policy.test.ts). 13:13 MANUAL-5_13-13.png
MANUAL-2 validate-seo script exited with code 0. 32 warnings but no errors. 13:29 MANUAL-2_13-29.png
MANUAL-1 Next.js 16.1.6 build compiled successfully in 90s with 173 static pages generated. 18:49 MANUAL-1_18-49.png
❌ Failed (2)
Test Case Summary Timestamp Screenshot
EDGE-4 HowTo JSON-LD is NOT present on /get-started/quick-start despite having 5 Step headings. 6:05 EDGE-4_6-05.png
ADV-3 OG image route for non-existent slug returns HTTP 500 instead of 404. 11:56 ADV-3_11-56.png
EDGE-4: Verify HowTo schema only emits with 3+ step headings on techArticle pages – Failed
  • Where: /get-started/quick-start page

  • Steps to reproduce:

    1. Navigate to /get-started/quick-start
    2. Verify the page has 5 Step headings (Step 1 through Step 5)
    3. Inspect the JSON-LD structured data in the HTML
  • What failed: The page has 5 Step headings and resolves as techArticle (enablesHowTo=true, minHowToSteps=3), but HowTo JSON-LD is NOT present in the rendered output. Only TechArticle, BreadcrumbList, WebPage, Organization, and WebSite schemas are rendered.

  • Code analysis: The HowTo schema emission logic exists in page-json-ld.tsx and relies on resolveSchemaPolicy from schema-policy.ts. The policy correctly enables HowTo for techArticle pages with 3+ step headings. The issue appears to be in how TOC items are extracted and matched against the step heading pattern.

  • Relevant code:

    src/lib/schema-policy.ts (lines 82–97)

    function countStepHeadings(tocTitles: readonly string[]) {
      return tocTitles.filter((title) => stepHeadingPattern.test(title)).length;
    }
    
    export function resolveSchemaPolicy({
      url,
      tocTitles = [],
    }: SchemaPolicyMatchInput): ResolvedSchemaPolicy {
      const normalizedUrl = normalizeUrl(url);
      const matched =
        SEO_SCHEMA_POLICY_MATRIX.find((entry) =>
          entry.routePatterns.some((routePattern) => matchesRoutePattern(normalizedUrl, routePattern))
        ) ?? SEO_SCHEMA_POLICY_MATRIX[SEO_SCHEMA_POLICY_MATRIX.length - 1];
    
      const stepCount = countStepHeadings(tocTitles);
      const includeHowTo = matched.enablesHowTo && stepCount >= (matched.minHowToSteps ?? 3);

    src/components/seo/page-json-ld.tsx (lines 106–111)

    const pageUrl = toAbsoluteUrl(url);
    const flattenedTocItems = flattenTocItems(tocItems);
    const schemaPolicy = resolveSchemaPolicy({
      url,
      tocTitles: flattenedTocItems.map((item) => item.title),
    });

    src/components/seo/page-json-ld.tsx (lines 224–241)

    if (schemaPolicy.includeHowTo) {
      const howToSteps = flattenedTocItems.filter((item) => stepHeadingPattern.test(item.title));
      if (howToSteps.length >= 3) {
        const howToLd = {
          '@context': 'https://schema.org',
          '@type': 'HowTo',
          name: title,
          description,
          url: pageUrl,
          step: howToSteps.map((item) => ({
            '@type': 'HowToStep',
            name: item.title,
            url: toAnchorUrl(pageUrl, item.url),
          })),
        } satisfies WithContext<HowTo>;
        jsonLdPayload.push(howToLd);
      }
    }
  • Why this is likely a bug: The code logic is designed to emit HowTo JSON-LD when a techArticle page has 3+ step headings in its TOC. The test confirms the page has 5 Step headings in content, but the TOC entries passed to resolveSchemaPolicy either don't include them or don't match the stepHeadingPattern regex. This indicates a disconnect between the page's actual content headings and what's extracted into the TOC data structure, causing the HowTo schema to never emit even when intended.

  • Introduced by this PR: Yes – this PR added the HowTo schema logic in page-json-ld.tsx and schema-policy.ts.

  • Timestamp: 6:05

ADV-3: Verify OG image route handles non-existent page slug gracefully – Failed
  • Where: /api/docs-og/non-existent-page-slug/image.png

  • Steps to reproduce:

    1. Request /api/docs-og/non-existent-page-slug/image.png via curl
    2. Check the HTTP response status
  • What failed: The route returns HTTP 500 Internal Server Error. Expected either HTTP 404 or an empty response for missing pages.

  • Code analysis: The OG image route handler checks if the page exists and returns undefined when it doesn't. Returning undefined from a Next.js API route causes a 500 error because no valid Response object is provided.

  • Relevant code:

    src/app/api/docs-og/[...slug]/route.tsx (lines 27–30)

    export const GET = async (_req: NextRequest, ctx: RouteContext<'/api/docs-og/[...slug]'>) => {
      const { slug } = await ctx.params;
      const page = source.getPage(slug.slice(0, -1));
      if (!page) return;
  • Why this is likely a bug: The code uses if (!page) return; which returns undefined instead of a proper Response object. In Next.js App Router API routes, returning undefined causes the server to fail with HTTP 500 because no response is provided. The code should return new Response(null, { status: 404 }) or NextResponse.json({ error: 'Not found' }, { status: 404 }) to gracefully handle missing pages.

  • Introduced by this PR: No – pre-existing bug (this PR only added Cache-Control headers to the route, the undefined return existed before).

  • Timestamp: 11:56

📋 View Recording

Screen Recording

@robert-inkeep robert-inkeep self-requested a review February 27, 2026 22:45
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review scoped to changes since the last automated review (commit 26937f8a9). The latest commit fe3a2a8b3 is a merge commit bringing in changes from main.

✅ Delta Changes Verified

Change Type Files Assessment
SEO infrastructure code (src/, scripts/, source.config.ts) 0 files ✅ No changes — prior approved code unchanged
Documentation content (content/) 8 files ✅ Content updates from main (artifact docs, scheduled triggers, troubleshooting)

📝 Content Updates Merged from Main

The following documentation content was updated via the merge (not SEO-specific):

  • artifact-components.mdx — Added "Passing Artifacts to Tools" section, clarified preview vs. non-preview fields
  • scheduled.mdx (both talk-to-your-agents and typescript-sdk) — Added user-scoped execution documentation
  • data-operations.mdx, chat-api.mdx — Updated artifact event documentation
  • environment-configuration.mdx, troubleshooting.mdx — Added dev logout cookie note

These content changes follow existing patterns and don't affect SEO infrastructure.

🔍 Ito Test Failures Assessment

The Ito report from 2026-02-27 flagged 2 failures:

Failure Root Cause Introduced by Delta?
EDGE-4: HowTo schema not emitting TOC entries for "Step N" headings may be nested at h3 level, not matching top-level TOC filter ❌ No — pre-existing logic
ADV-3: OG image 500 for non-existent slugs if (!page) return; returns undefined → 500 ❌ No — Ito confirms this is a pre-existing bug

Neither failure was introduced by this PR's delta. The ADV-3 issue was explicitly noted by Ito as pre-existing. The EDGE-4 issue relates to how fumadocs extracts TOC entries, not to the delta changes.

🕐 Pending Recommendations (2)

Minor items from prior reviews that remain unaddressed (low priority):

  • 🟡 smoke-seo.ts:117-121 Readability: html variable check is gated on searchRouteResponse.ok — consider adding a clarifying comment
  • 🟡 mdx-components.tsx:76 Empty alt fallback ('') marks images as decorative — consider 'Documentation image' for content-relevant images

✅ APPROVE

Summary: The delta since last review is a merge commit from main with no changes to SEO infrastructure code. The 8 documentation content files updated are standard content additions (artifact docs, scheduled triggers, troubleshooting notes) that don't affect SEO metadata or structure. All prior review feedback has been addressed in earlier commits. The Ito test failures are pre-existing issues not introduced by this PR. Ready to merge. 🎉

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded

Note: No reviewers dispatched — delta contained only a merge commit with documentation content updates, no SEO infrastructure code changes.

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 27, 2026

Ito Test Report ❌

35 test cases ran. 34 passed, 1 failed.

This verification run tested SEO metadata, crawler routes, and freshness signals for the Inkeep Agents Docs site (PR #2397). All critical SEO infrastructure is working correctly: homepage redirects, JSON-LD structured data, robots.txt, sitemap.xml, manifest, and LLM-readable routes all passed verification. Unit tests and build validation also succeeded. However, one bug was identified: the HowTo schema emission feature is non-functional due to a data type mismatch in TOC title normalization.

✅ Passed (34)
Test Case Summary Timestamp Screenshot
ROUTE-1 Homepage returns HTTP 307 redirect to /overview. Verified via curl and browser navigation. 2:05 ROUTE-1_2-05.png
ROUTE-2 All 5 required metadata elements verified: canonical link, meta description, og:url, JSON-LD structured data (2 blocks), and twitter:card. 3:03 ROUTE-2_3-03.png
ROUTE-3 robots.txt returns HTTP 200 with correct directives: User-Agent: *, Allow: /, Disallow: /api/, Sitemap, Host. 4:23 ROUTE-3_4-23.png
ROUTE-4 manifest.webmanifest returns HTTP 200 with valid JSON. name=Inkeep Open Source Docs, short_name=Inkeep Docs, start_url=/overview, display=standalone. 3 icons defined. 4:33 ROUTE-4_4-33.png
ROUTE-5 sitemap.xml returns HTTP 200 with valid XML. /overview has priority=1 and changefreq=daily. Nested pages have tiered priority (0.5-0.8) and changefreq values. 4:42 ROUTE-5_4-42.png
ROUTE-6 /llms.txt returns HTTP 200 with body starting with # Inkeep, containing ## Docs section and page entries. Content-Type text/plain; charset=utf-8 confirmed. 5:34 ROUTE-6_5-34.png
ROUTE-7 /llms-full.txt returns HTTP 200 with 22271 lines of processed content. Content-Type text/plain; charset=utf-8 confirmed. 5:39 ROUTE-7_5-39.png
ROUTE-8 OG image at /api/docs-og/overview/image.png returns HTTP 200 with Content-Type: image/png and CDN cache headers (s-maxage=2592000, stale-while-revalidate=86400). 6:56 ROUTE-8_6-56.png
ROUTE-9 /concepts page contains BreadcrumbList (2 items), WebPage, and TechArticle JSON-LD. All URLs absolute. 8:09 ROUTE-9_8-09.png
ROUTE-10 Layout-level JSON-LD contains Organization (name=Inkeep, logo, foundingDate=2023, sameAs with 5 social URLs) and WebSite schemas. 8:52 ROUTE-10_8-52.png
ROUTE-11 Canonical URL points to /overview. OG image URL has dimensions width=1200, height=630. Hreflang en-US alternate link present. 3:09 ROUTE-11_3-09.png
ROUTE-12 /api-reference returns 200, JSON-LD contains CollectionPage with name='Inkeep API'. No TechArticle or SoftwareApplication present. 13:02 ROUTE-12_13-02.png
ROUTE-13 SoftwareApplication JSON-LD verified with name=Inkeep Open Source, applicationCategory=DeveloperApplication, operatingSystem=Web, publisher.name=Inkeep. 3:04 ROUTE-13_3-04.png
ROUTE-14 HTML contains link rel=manifest href=/manifest.webmanifest as expected. 3:09 ROUTE-14_3-09.png
ROUTE-15 Three icon tags verified: SVG icon (/icon.svg), favicon (/favicon.ico), and apple-touch-icon (/apple-touch-icon.png, sizes=180x180). 3:11 ROUTE-15_3-11.png
EDGE-1 BreadcrumbList JSON-LD has exactly 1 ListItem with name=Overview and position=1. No duplicate Overview entry. 3:05 EDGE-1_3-05.png
EDGE-2 BreadcrumbList on /concepts page has first ListItem with name=Overview at position 1, followed by Concepts at position 2. 8:10 EDGE-2_8-10.png
EDGE-3 Sitemap correctly handles pages without datePublished/dateModified by omitting lastmod tags entirely. No Invalid Date or NaN strings found. 4:57 EDGE-3_4-57.png
EDGE-4 All deleted .mdx and .md routes return HTTP 404: /llms.mdx, /llms.mdx/overview, /some-page.mdx, /some-page.md all confirmed 404. 15:37 EDGE-4_15-37.png
EDGE-5 /api-reference CollectionPage JSON-LD does not include ItemList since the page has no TOC items. Code correctly gates ItemList emission. 14:02 EDGE-5_14-02.png
EDGE-6 /typescript-sdk/triggers/overview renders CollectionPage JSON-LD per the /**/overview schema-policy pattern. No TechArticle or SoftwareApplication. 14:12 EDGE-6_14-12.png
EDGE-7 OG image at /api/docs-og/visual-builder/tools/mcp-servers/image.png (3-segment deep slug) returns HTTP 200 with correct headers. 6:57 EDGE-7_6-57.png
EDGE-8 The string 'undefined' does not appear in /llms.txt response body. Pages without descriptions have empty description fields. 5:40 EDGE-8_5-40.png
EDGE-9 No occurrence of docs.inkeep.com// found in overview HTML. All URLs are well-formed without double slashes. 3:07 EDGE-9_3-07.png
EDGE-10 mdx-components.tsx Image and img components use alt={props.alt ?? ''} for empty string fallback. Zero images found with alt='Image'. 19:19 EDGE-10_19-19.png
ADV-1 All JSON-LD blocks parse as valid JSON. No raw < characters appear inside JSON-LD content. XSS prevention is effective. 3:12 ADV-1_3-12.png
ADV-2 Unit tests confirm schema policy correctly rejects partial route prefix matches. All 13 schema-policy tests passed. 24:08 ADV-2_24-08.png
ADV-3 robots.txt contains Disallow: /api/ directive, correctly preventing search engine crawlers from accessing API routes. 5:04 ADV-3_5-04.png
ADV-4 Non-existent URL returns HTTP 404. Page contains only site-wide Organization and WebSite JSON-LD from layout. No per-page schemas emitted. 16:22 ADV-4_16-22.png
ADV-5 Both /llms.txt and /llms-full.txt return Content-Type: text/plain; charset=utf-8 and X-Content-Type-Options: nosniff security header. 5:53 ADV-5_5-53.png
LOGIC-2 validate-seo script exits with code 0. 32 warnings reported but no errors. 20:54 LOGIC-2_20-54.png
LOGIC-3 All 23 unit tests passed (10 freshness + 13 schema-policy) with vitest. Exit code 0. 20:41 LOGIC-3_20-41.png
LOGIC-4 Full build completed successfully with exit code 0. 173 static pages generated. Prebuild (validate-seo + generate-skill-collections) passed. 23:48 LOGIC-4_23-48.png
LOGIC-5 source.config.ts defines keywords as z.union([z.string(), z.array(z.string())]).optional(). Runtime check shows keywords meta tag correctly rendered. 25:23 LOGIC-5_25-23.png
❌ Failed (1)
Test Case Summary Timestamp Screenshot
LOGIC-1 HowTo JSON-LD is NOT emitted on pages with step headings due to TOC title format mismatch. 14:47 LOGIC-1_14-47.png
HowTo schema emitted for pages with 3+ step headings – Failed
  • Where: /typescript-sdk/credentials/nango page (and any page with step headings)

  • Steps to reproduce:

    1. Navigate to a documentation page with 3+ "Step N" headings (e.g., /typescript-sdk/credentials/nango)
    2. View the page source or inspect JSON-LD structured data
    3. Observe that HowTo schema is not emitted despite visible step headings in the TOC
  • What failed: HowTo JSON-LD schema should be emitted for pages with 3+ step headings, but no HowTo schema appears in the page's JSON-LD output. The page /typescript-sdk/credentials/nango has 10+ "Step N" headings visible in the TOC, yet the HowTo schema is absent.

  • Code analysis: The bug is in the normalizeTitle() function in page-json-ld.tsx. This function handles TOC item titles but only processes string and number types. Fumadocs delivers TOC item titles as ReactNode objects (not plain strings), causing normalizeTitle() to return an empty string for all TOC entries. This cascades to flattenTocItems() filtering out all entries, resulting in an empty array that prevents HowTo schema emission.

  • Relevant code:

    agents-docs/src/components/seo/page-json-ld.tsx (lines 24–34)

    function normalizeTitle(value: unknown) {
      if (typeof value === 'string') {
        return value.trim();
      }
    
      if (typeof value === 'number') {
        return `${value}`;
      }
    
      return '';
    }

    agents-docs/src/components/seo/page-json-ld.tsx (lines 74–95)

    function flattenTocItems(tocItems: readonly TocEntry[] = []) {
      const entries: Array<{ title: string; url: string }> = [];
    
      const walk = (items: readonly TocEntry[] = []) => {
        for (const item of items) {
          const title = normalizeTitle(item.title);
          if (title && item.url) {
            entries.push({
              title,
              url: item.url,
            });
          }
          // ...
        }
      };
    
      walk(tocItems);
      return entries;
    }

    agents-docs/src/components/seo/page-json-ld.tsx (lines 224–241)

    if (schemaPolicy.includeHowTo) {
      const howToSteps = flattenedTocItems.filter((item) => stepHeadingPattern.test(item.title));
      if (howToSteps.length >= 3) {
        const howToLd = {
          '@context': 'https://schema.org',
          '@type': 'HowTo',
          name: title,
          // ...
        } satisfies WithContext<HowTo>;
        jsonLdPayload.push(howToLd);
      }
    }
  • Why this is likely a bug: The code logic for HowTo schema emission is correctly implemented (policy enables it for techArticle pages, step detection regex exists, 3+ threshold is checked), but the data normalization layer doesn't handle the actual runtime data format from fumadocs. When normalizeTitle() receives a ReactNode object instead of a string, it returns '', causing all TOC entries to be filtered out. This means the HowTo schema feature is completely non-functional for all pages.

  • Introduced by this PR: Yes – this PR added the page-json-ld.tsx file with the normalizeTitle() function. The file was added in this PR (see agents-docs/src/components/seo/page-json-ld.tsx in the changed files list).

  • Timestamp: 14:47

📋 View Recording

Screen Recording

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 7, 2026

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed in 7 days if no further activity occurs.

If this PR is still relevant:

  • Rebase it on the latest main branch
  • Add a comment explaining its current status
  • Request a review if it's ready

Thank you for your contributions!

@github-actions github-actions Bot added the stale label Mar 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically closed due to inactivity.

If you'd like to continue working on this, please:

  1. Create a new branch from the latest main
  2. Cherry-pick your commits or rebase your changes
  3. Open a new pull request

Thank you for your understanding!

@github-actions github-actions Bot closed this Mar 15, 2026
@github-actions github-actions Bot deleted the docs/seo-improvements branch March 15, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants