Skip to content

ci: add skill review CI step#688

Merged
stack72 merged 3 commits intomainfrom
ci/skill-review
Mar 11, 2026
Merged

ci: add skill review CI step#688
stack72 merged 3 commits intomainfrom
ci/skill-review

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 11, 2026

Summary

  • Add scripts/review_skills.ts — reviews all bundled skills via npx tessl skill review, builds a markdown summary table for GITHUB_STEP_SUMMARY, and exits 1 if any skill's average score drops below 90%
  • Add review-skills task to deno.json
  • Add skill-review CI job (Deno + Node.js) to .github/workflows/ci.yml
  • Update needs arrays for claude-review, claude-adversarial-review, and auto-merge to gate on skill-review
  • Update skill descriptions and add data-ownership.md and expressions.md reference docs to swamp-data and swamp-model skills
  • Register new reference files in skill_assets.ts and update tests

Test plan

  • Ran deno run review-skills locally — all 8 skills pass at 100%
  • CI skill-review job passes on this PR
  • Existing test, deps-audit, and review jobs still pass

🤖 Generated with Claude Code

stack72 and others added 2 commits March 11, 2026 18:31
Add a new CI job that reviews all bundled skills using `tessl skill review`
and fails the build if any skill's average score drops below 90%.

Changes:
- New `scripts/review_skills.ts` that reviews each skill, builds a markdown
  summary table, writes to GITHUB_STEP_SUMMARY, and exits 0/1
- Add `review-skills` task to deno.json
- Add `skill-review` CI job with Deno + Node.js setup
- Update `needs` arrays for claude-review, claude-adversarial-review,
  and auto-merge to include skill-review
- Update skill descriptions and add reference docs for data-ownership
  and expressions to swamp-data and swamp-model skills
- Update skill_assets.ts to register new reference files

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…atting

- Extract JSON from stdout by finding first '{' to skip npx download
  progress that gets written to stdout on first run in CI
- Run deno fmt on swamp-issue/SKILL.md to fix table formatting

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: ci: add skill review CI step

Blocking Issues

None found.

Review Summary

This PR adds a CI step to review bundled skills using tessl skill review, improving quality assurance for skill documentation. The implementation is clean and follows project conventions.

TypeScript & Code Style

  • All new TypeScript files include the required AGPLv3 license header
  • No any types used; proper interfaces defined (ReviewResult, SkillScore)
  • Follows TypeScript strict mode requirements

Test Coverage

  • Tests added to skill_assets_test.ts for the new reference files (data-ownership.md, expressions.md)
  • The review_skills.ts script is in scripts/ which is excluded from tests per deno.json, consistent with other scripts like audit_deps.ts and compile.ts

Domain-Driven Design

  • Changes are infrastructure/CI-focused rather than domain logic
  • The SkillAssets class properly abstracts file access as a Repository-like pattern
  • No domain model changes required review

Security

  • The script uses hardcoded skill paths from BUNDLED_SKILLS preventing path injection
  • No user input flows into shell commands

CI Integration

  • skill-review job correctly configured with Deno + Node.js setup
  • Proper permissions for GITHUB_STEP_SUMMARY
  • Correctly gates claude-review, claude-adversarial-review, and auto-merge on the new job

Suggestions (non-blocking)

  1. JSON parsing validation: The JSON parsing in review_skills.ts (lines 67-73) uses a type assertion (as ReviewResult) without runtime validation. If the external tessl tool changes its output format, this could cause unclear runtime errors. Consider adding Zod validation for robustness. However, this is a nice-to-have since the script would fail early with clear context.

  2. Interface completeness: The ReviewResult interface assumes specific fields exist. Adding optional fields or using a more defensive access pattern could handle variations in tessl output versions.

Verdict

APPROVE - Clean implementation following project conventions with appropriate test coverage.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

  1. scripts/review_skills.ts:118-131 — NaN scores silently pass threshold check

    What's wrong: The code extracts normalizedScore from the JSON without validating it's actually a number. If tessl returns malformed JSON where normalizedScore is missing or non-numeric, the score becomes undefined or coerces to NaN. The threshold check averageScore < THRESHOLD evaluates to false for NaN (since all NaN comparisons return false), causing the skill to PASS when it should FAIL.

    Breaking example: If tessl returns:

    {
      "validation": {"overallPassed": true},
      "descriptionJudge": {},
      "contentJudge": {"normalizedScore": 1.0}
    }

    Then descriptionScore = undefined, averageScore = NaN, NaN < 0.9 → false, and the skill passes despite having no description score. The output would show "NaN%" in the table but the CI job exits 0.

    Suggested fix: Validate scores are finite numbers before use:

    const descriptionScore = result.descriptionJudge?.normalizedScore;
    const contentScore = result.contentJudge?.normalizedScore;
    
    if (typeof descriptionScore !== 'number' || !Number.isFinite(descriptionScore) ||
        typeof contentScore !== 'number' || !Number.isFinite(contentScore)) {
      throw new Error(`Invalid scores from tessl for ${skillDir}: desc=${descriptionScore}, content=${contentScore}`);
    }

Medium

  1. scripts/review_skills.ts:67-73 — JSON extraction fails with trailing stdout content

    What's wrong: The code finds the first { and parses from there to end of stdout. If tessl writes ANY non-whitespace characters after the JSON object, JSON.parse throws "Unexpected token". The comment acknowledges npx may write LEADING content but doesn't handle TRAILING content.

    Breaking example: If stdout is:

    {"validation":...}
    Review complete.
    

    The parse fails with "Unexpected token 'R'". This causes the skill to fail (not silently pass), so it's not a false-positive risk, but it could cause spurious CI failures.

    Suggested fix: Extract just the JSON object by finding matching braces, or use a regex like /{[\s\S]*}/ to extract the JSON portion.

  2. scripts/review_skills.ts:73 — Type assertion provides no runtime validation

    The as ReviewResult assertion is compile-time only. If the JSON schema changes or tessl returns different fields, the code will fail with cryptic errors or (as noted above) produce NaN scores.

    Suggested fix: Use Zod or similar for runtime validation of the ReviewResult structure.

Low

  1. scripts/review_skills.ts:139 — Error stringification may lose details

    ${error} on a non-Error object (e.g., if tessl throws a plain object) produces [object Object]. Consider error instanceof Error ? error.message : String(error) for better diagnostics.

  2. No timeout on subprocess execution

    If tessl hangs, the CI job relies entirely on GitHub Actions' job timeout (6 hours by default). Consider adding a timeout to Deno.Command for faster feedback.

Verdict

FAIL — The NaN score bypass (item #1 under Critical/High) allows skills to pass CI validation despite missing or invalid scores from the tessl tool. This defeats the purpose of the skill review gate.

Address adversarial review findings:
- Validate normalizedScore fields are finite numbers, throwing on NaN/missing
  instead of silently passing the threshold check
- Extract JSON by matching braces to handle trailing stdout content
- Validate validation.overallPassed is a boolean
- Use proper error stringification for non-Error thrown values

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No blocking issues found. This PR is well-structured and ready to merge.

Review Checklist

Code Quality & TypeScript Strict Mode

  • scripts/review_skills.ts uses proper TypeScript interfaces (ReviewResult, SkillScore)
  • Uses Record<string, unknown> for safe JSON parsing with explicit type guards
  • No any types used

CLAUDE.md Compliance

  • AGPLv3 copyright headers present in all .ts files
  • Named exports used (SkillAssets, SkillInfo)
  • Unit tests in skill_assets_test.ts cover the new reference files

Domain-Driven Design

  • SkillAssets appropriately implements the Repository pattern for skill file access
  • SkillInfo is a proper Value Object (identity by value, immutable structure)
  • The review script is infrastructure/automation code - DDD patterns don't apply directly

Test Coverage

  • Tests added for new swamp-data reference files (data-ownership.md, expressions.md)
  • Tests added for new swamp-model reference files (data-ownership.md, expressions.md)
  • CI skill-review job validates the script works correctly

Security

  • Command execution uses Deno.Command with array arguments, preventing command injection
  • skillDir is derived from hardcoded BUNDLED_SKILLS, not user input
  • No path traversal vulnerabilities

Suggestions (non-blocking)

  1. JSON extraction robustness: The brace-matching logic to extract JSON from npx stdout is pragmatic for handling noisy output. If npx behavior changes significantly, this could need adjustment, but it's fine for now.

  2. Parallelization opportunity: Skills are reviewed sequentially. For faster execution with more skills, Promise.all could parallelize the reviews. Not necessary at current scale.

The CI checks (lint, test, format, skill-review) all pass. Good work on improving skill quality assurance!

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None.

Medium

  1. scripts/review_skills.ts:74-85 - JSON extraction algorithm doesn't handle braces inside string literals

    What's wrong: The brace-matching algorithm naively counts { and } characters to find JSON boundaries, but doesn't account for braces inside JSON string values.

    Breaking example: If tessl outputs JSON like:

    {"feedback": "Consider using {substitution} syntax", "normalizedScore": 0.9}
    

    The algorithm traces:

    • Position 0: { → depth=1
    • Position 23: { (inside string) → depth=2
    • Position 36: } (inside string) → depth=1
    • Position 46: } (end of object) → depth=0, extract here

    But with balanced braces inside the string, it exits at the wrong position. For example, {"key": "has { and } in it"} would extract only {"key": "has { and } which is malformed JSON.

    Impact: JSON.parse throws, caught at line 192, skill gets 0 score, CI fails incorrectly. This is a false negative that could block valid PRs.

    Likelihood: Depends on tessl's output format. If tessl includes any string fields with braces in feedback/error messages, this triggers. The fields we read (normalizedScore, overallPassed) are numbers/booleans, but if the JSON object contains other string fields with braces, the extraction breaks.

    Suggested fix: Either:

    • Trust that tessl's --json output is valid JSON and just find the last line that starts with { or use JSON.parse(stdout.trim()) and catch errors
    • Implement proper JSON boundary detection that tracks quote state and escape sequences
    • Use a regex that finds the outermost object: /^\{[\s\S]*\}$/m

Low

  1. scripts/review_skills.ts:210 - GITHUB_STEP_SUMMARY write error not caught

    If the GITHUB_STEP_SUMMARY file can't be written (permission denied, disk full), the script crashes with an unhandled promise rejection. The error message would come from Deno's filesystem layer which is reasonably clear, but wrapping in try/catch with a clearer message would improve debuggability.

  2. .github/workflows/ci.yml:785 - Broader permissions than necessary

    --allow-write grants unrestricted write access. Could be tightened to --allow-write=$GITHUB_STEP_SUMMARY for defense in depth. Not exploitable in practice since the env var is controlled by GitHub Actions, but follows least-privilege principle.

Verdict

PASS - The JSON extraction bug is a real issue but rated MEDIUM because:

  • Failure mode is explicit (parse error caught, skill marked as failed)
  • No silent data corruption or security impact
  • Depends on tessl output format which may not include problematic strings
  • The PR author notes they tested locally with all 8 skills passing at 100%

The skill refactoring (moving content to reference files) and CI integration are straightforward and correct. Tests cover the new reference file registration.

@stack72 stack72 merged commit 980e703 into main Mar 11, 2026
7 checks passed
@stack72 stack72 deleted the ci/skill-review branch March 11, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant