ci: add skill review CI step by stack72 · Pull Request #688 · systeminit/swamp

stack72 · 2026-03-11T18:33:15Z

Summary

Add scripts/review_skills.ts — reviews all bundled skills via npx tessl skill review, builds a markdown summary table for GITHUB_STEP_SUMMARY, and exits 1 if any skill's average score drops below 90%
Add review-skills task to deno.json
Add skill-review CI job (Deno + Node.js) to .github/workflows/ci.yml
Update needs arrays for claude-review, claude-adversarial-review, and auto-merge to gate on skill-review
Update skill descriptions and add data-ownership.md and expressions.md reference docs to swamp-data and swamp-model skills
Register new reference files in skill_assets.ts and update tests

Test plan

Ran deno run review-skills locally — all 8 skills pass at 100%
CI skill-review job passes on this PR
Existing test, deps-audit, and review jobs still pass

🤖 Generated with Claude Code

Add a new CI job that reviews all bundled skills using `tessl skill review` and fails the build if any skill's average score drops below 90%. Changes: - New `scripts/review_skills.ts` that reviews each skill, builds a markdown summary table, writes to GITHUB_STEP_SUMMARY, and exits 0/1 - Add `review-skills` task to deno.json - Add `skill-review` CI job with Deno + Node.js setup - Update `needs` arrays for claude-review, claude-adversarial-review, and auto-merge to include skill-review - Update skill descriptions and add reference docs for data-ownership and expressions to swamp-data and swamp-model skills - Update skill_assets.ts to register new reference files Co-Authored-By: Claude Opus 4.6 <[email protected]>

…atting - Extract JSON from stdout by finding first '{' to skip npx download progress that gets written to stdout on first run in CI - Run deno fmt on swamp-issue/SKILL.md to fix table formatting Co-Authored-By: Claude Opus 4.6 <[email protected]>

github-actions

Code Review: ci: add skill review CI step

Blocking Issues

None found.

Review Summary

This PR adds a CI step to review bundled skills using tessl skill review, improving quality assurance for skill documentation. The implementation is clean and follows project conventions.

TypeScript & Code Style ✓

All new TypeScript files include the required AGPLv3 license header
No any types used; proper interfaces defined (ReviewResult, SkillScore)
Follows TypeScript strict mode requirements

Test Coverage ✓

Tests added to skill_assets_test.ts for the new reference files (data-ownership.md, expressions.md)
The review_skills.ts script is in scripts/ which is excluded from tests per deno.json, consistent with other scripts like audit_deps.ts and compile.ts

Domain-Driven Design ✓

Changes are infrastructure/CI-focused rather than domain logic
The SkillAssets class properly abstracts file access as a Repository-like pattern
No domain model changes required review

Security ✓

The script uses hardcoded skill paths from BUNDLED_SKILLS preventing path injection
No user input flows into shell commands

CI Integration ✓

skill-review job correctly configured with Deno + Node.js setup
Proper permissions for GITHUB_STEP_SUMMARY
Correctly gates claude-review, claude-adversarial-review, and auto-merge on the new job

Suggestions (non-blocking)

JSON parsing validation: The JSON parsing in review_skills.ts (lines 67-73) uses a type assertion (as ReviewResult) without runtime validation. If the external tessl tool changes its output format, this could cause unclear runtime errors. Consider adding Zod validation for robustness. However, this is a nice-to-have since the script would fail early with clear context.
Interface completeness: The ReviewResult interface assumes specific fields exist. Adding optional fields or using a more defensive access pattern could handle variations in tessl output versions.

Verdict

APPROVE - Clean implementation following project conventions with appropriate test coverage.

github-actions

Adversarial Review

Critical / High

scripts/review_skills.ts:118-131 — NaN scores silently pass threshold check

What's wrong: The code extracts normalizedScore from the JSON without validating it's actually a number. If tessl returns malformed JSON where normalizedScore is missing or non-numeric, the score becomes undefined or coerces to NaN. The threshold check averageScore < THRESHOLD evaluates to false for NaN (since all NaN comparisons return false), causing the skill to PASS when it should FAIL.

Breaking example: If tessl returns:
```
{
  "validation": {"overallPassed": true},
  "descriptionJudge": {},
  "contentJudge": {"normalizedScore": 1.0}
}
```
Then descriptionScore = undefined, averageScore = NaN, NaN < 0.9 → false, and the skill passes despite having no description score. The output would show "NaN%" in the table but the CI job exits 0.

Suggested fix: Validate scores are finite numbers before use:
```
const descriptionScore = result.descriptionJudge?.normalizedScore;
const contentScore = result.contentJudge?.normalizedScore;

if (typeof descriptionScore !== 'number' || !Number.isFinite(descriptionScore) ||
    typeof contentScore !== 'number' || !Number.isFinite(contentScore)) {
  throw new Error(`Invalid scores from tessl for ${skillDir}: desc=${descriptionScore}, content=${contentScore}`);
}
```

Medium

scripts/review_skills.ts:67-73 — JSON extraction fails with trailing stdout content

What's wrong: The code finds the first { and parses from there to end of stdout. If tessl writes ANY non-whitespace characters after the JSON object, JSON.parse throws "Unexpected token". The comment acknowledges npx may write LEADING content but doesn't handle TRAILING content.

Breaking example: If stdout is:
```
{"validation":...}
Review complete.
```
The parse fails with "Unexpected token 'R'". This causes the skill to fail (not silently pass), so it's not a false-positive risk, but it could cause spurious CI failures.

Suggested fix: Extract just the JSON object by finding matching braces, or use a regex like /{[\s\S]*}/ to extract the JSON portion.
scripts/review_skills.ts:73 — Type assertion provides no runtime validation

The as ReviewResult assertion is compile-time only. If the JSON schema changes or tessl returns different fields, the code will fail with cryptic errors or (as noted above) produce NaN scores.

Suggested fix: Use Zod or similar for runtime validation of the ReviewResult structure.

Low

scripts/review_skills.ts:139 — Error stringification may lose details

${error} on a non-Error object (e.g., if tessl throws a plain object) produces [object Object]. Consider error instanceof Error ? error.message : String(error) for better diagnostics.
No timeout on subprocess execution

If tessl hangs, the CI job relies entirely on GitHub Actions' job timeout (6 hours by default). Consider adding a timeout to Deno.Command for faster feedback.

Verdict

FAIL — The NaN score bypass (item #1 under Critical/High) allows skills to pass CI validation despite missing or invalid scores from the tessl tool. This defeats the purpose of the skill review gate.

Address adversarial review findings: - Validate normalizedScore fields are finite numbers, throwing on NaN/missing instead of silently passing the threshold check - Extract JSON by matching braces to handle trailing stdout content - Validate validation.overallPassed is a boolean - Use proper error stringification for non-Error thrown values Co-Authored-By: Claude Opus 4.6 <[email protected]>

github-actions

Code Review Summary

No blocking issues found. This PR is well-structured and ready to merge.

Review Checklist

Code Quality & TypeScript Strict Mode

scripts/review_skills.ts uses proper TypeScript interfaces (ReviewResult, SkillScore)
Uses Record<string, unknown> for safe JSON parsing with explicit type guards
No any types used

CLAUDE.md Compliance

AGPLv3 copyright headers present in all .ts files
Named exports used (SkillAssets, SkillInfo)
Unit tests in skill_assets_test.ts cover the new reference files

Domain-Driven Design

SkillAssets appropriately implements the Repository pattern for skill file access
SkillInfo is a proper Value Object (identity by value, immutable structure)
The review script is infrastructure/automation code - DDD patterns don't apply directly

Test Coverage

Tests added for new swamp-data reference files (data-ownership.md, expressions.md)
Tests added for new swamp-model reference files (data-ownership.md, expressions.md)
CI skill-review job validates the script works correctly

Security

Command execution uses Deno.Command with array arguments, preventing command injection
skillDir is derived from hardcoded BUNDLED_SKILLS, not user input
No path traversal vulnerabilities

Suggestions (non-blocking)

JSON extraction robustness: The brace-matching logic to extract JSON from npx stdout is pragmatic for handling noisy output. If npx behavior changes significantly, this could need adjustment, but it's fine for now.
Parallelization opportunity: Skills are reviewed sequentially. For faster execution with more skills, Promise.all could parallelize the reviews. Not necessary at current scale.

The CI checks (lint, test, format, skill-review) all pass. Good work on improving skill quality assurance!

github-actions

Adversarial Review

Critical / High

None.

Medium

scripts/review_skills.ts:74-85 - JSON extraction algorithm doesn't handle braces inside string literals

What's wrong: The brace-matching algorithm naively counts { and } characters to find JSON boundaries, but doesn't account for braces inside JSON string values.

Breaking example: If tessl outputs JSON like:
```
{"feedback": "Consider using {substitution} syntax", "normalizedScore": 0.9}
```
The algorithm traces:
- Position 0: { → depth=1
- Position 23: { (inside string) → depth=2
- Position 36: } (inside string) → depth=1
- Position 46: } (end of object) → depth=0, extract here
But with balanced braces inside the string, it exits at the wrong position. For example, {"key": "has { and } in it"} would extract only {"key": "has { and } which is malformed JSON.

Impact: JSON.parse throws, caught at line 192, skill gets 0 score, CI fails incorrectly. This is a false negative that could block valid PRs.

Likelihood: Depends on tessl's output format. If tessl includes any string fields with braces in feedback/error messages, this triggers. The fields we read (normalizedScore, overallPassed) are numbers/booleans, but if the JSON object contains other string fields with braces, the extraction breaks.

Suggested fix: Either:
- Trust that tessl's --json output is valid JSON and just find the last line that starts with { or use JSON.parse(stdout.trim()) and catch errors
- Implement proper JSON boundary detection that tracks quote state and escape sequences
- Use a regex that finds the outermost object: /^\{[\s\S]*\}$/m

Low

scripts/review_skills.ts:210 - GITHUB_STEP_SUMMARY write error not caught

If the GITHUB_STEP_SUMMARY file can't be written (permission denied, disk full), the script crashes with an unhandled promise rejection. The error message would come from Deno's filesystem layer which is reasonably clear, but wrapping in try/catch with a clearer message would improve debuggability.
.github/workflows/ci.yml:785 - Broader permissions than necessary

--allow-write grants unrestricted write access. Could be tightened to --allow-write=$GITHUB_STEP_SUMMARY for defense in depth. Not exploitable in practice since the env var is controlled by GitHub Actions, but follows least-privilege principle.

Verdict

PASS - The JSON extraction bug is a real issue but rated MEDIUM because:

Failure mode is explicit (parse error caught, skill marked as failed)
No silent data corruption or security impact
Depends on tessl output format which may not include problematic strings
The PR author notes they tested locally with all 8 skills passing at 100%

The skill refactoring (moving content to reference files) and CI integration are straightforward and correct. Tests cover the new reference file registration.

stack72 and others added 2 commits March 11, 2026 18:31

github-actions Bot approved these changes Mar 11, 2026

View reviewed changes

github-actions Bot requested changes Mar 11, 2026

View reviewed changes

github-actions Bot approved these changes Mar 11, 2026

View reviewed changes

github-actions Bot reviewed Mar 11, 2026

View reviewed changes

stack72 merged commit 980e703 into main Mar 11, 2026
7 checks passed

stack72 deleted the ci/skill-review branch March 11, 2026 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add skill review CI step#688

ci: add skill review CI step#688
stack72 merged 3 commits intomainfrom
ci/skill-review

stack72 commented Mar 11, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented Mar 11, 2026

Summary

Test plan

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review: ci: add skill review CI step

Blocking Issues

Review Summary

Suggestions (non-blocking)

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review Summary

Review Checklist

Suggestions (non-blocking)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant