Conversation
Add a new CI job that reviews all bundled skills using `tessl skill review` and fails the build if any skill's average score drops below 90%. Changes: - New `scripts/review_skills.ts` that reviews each skill, builds a markdown summary table, writes to GITHUB_STEP_SUMMARY, and exits 0/1 - Add `review-skills` task to deno.json - Add `skill-review` CI job with Deno + Node.js setup - Update `needs` arrays for claude-review, claude-adversarial-review, and auto-merge to include skill-review - Update skill descriptions and add reference docs for data-ownership and expressions to swamp-data and swamp-model skills - Update skill_assets.ts to register new reference files Co-Authored-By: Claude Opus 4.6 <[email protected]>
…atting
- Extract JSON from stdout by finding first '{' to skip npx download
progress that gets written to stdout on first run in CI
- Run deno fmt on swamp-issue/SKILL.md to fix table formatting
Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Code Review: ci: add skill review CI step
Blocking Issues
None found.
Review Summary
This PR adds a CI step to review bundled skills using tessl skill review, improving quality assurance for skill documentation. The implementation is clean and follows project conventions.
TypeScript & Code Style ✓
- All new TypeScript files include the required AGPLv3 license header
- No
anytypes used; proper interfaces defined (ReviewResult,SkillScore) - Follows TypeScript strict mode requirements
Test Coverage ✓
- Tests added to
skill_assets_test.tsfor the new reference files (data-ownership.md,expressions.md) - The
review_skills.tsscript is inscripts/which is excluded from tests perdeno.json, consistent with other scripts likeaudit_deps.tsandcompile.ts
Domain-Driven Design ✓
- Changes are infrastructure/CI-focused rather than domain logic
- The
SkillAssetsclass properly abstracts file access as a Repository-like pattern - No domain model changes required review
Security ✓
- The script uses hardcoded skill paths from
BUNDLED_SKILLSpreventing path injection - No user input flows into shell commands
CI Integration ✓
skill-reviewjob correctly configured with Deno + Node.js setup- Proper permissions for
GITHUB_STEP_SUMMARY - Correctly gates
claude-review,claude-adversarial-review, andauto-mergeon the new job
Suggestions (non-blocking)
-
JSON parsing validation: The JSON parsing in
review_skills.ts(lines 67-73) uses a type assertion (as ReviewResult) without runtime validation. If the externaltessltool changes its output format, this could cause unclear runtime errors. Consider adding Zod validation for robustness. However, this is a nice-to-have since the script would fail early with clear context. -
Interface completeness: The
ReviewResultinterface assumes specific fields exist. Adding optional fields or using a more defensive access pattern could handle variations intessloutput versions.
Verdict
APPROVE - Clean implementation following project conventions with appropriate test coverage.
There was a problem hiding this comment.
Adversarial Review
Critical / High
-
scripts/review_skills.ts:118-131 — NaN scores silently pass threshold check
What's wrong: The code extracts
normalizedScorefrom the JSON without validating it's actually a number. Iftesslreturns malformed JSON wherenormalizedScoreis missing or non-numeric, the score becomesundefinedor coerces to NaN. The threshold checkaverageScore < THRESHOLDevaluates tofalsefor NaN (since all NaN comparisons return false), causing the skill to PASS when it should FAIL.Breaking example: If tessl returns:
{ "validation": {"overallPassed": true}, "descriptionJudge": {}, "contentJudge": {"normalizedScore": 1.0} }Then
descriptionScore = undefined,averageScore = NaN,NaN < 0.9 → false, and the skill passes despite having no description score. The output would show "NaN%" in the table but the CI job exits 0.Suggested fix: Validate scores are finite numbers before use:
const descriptionScore = result.descriptionJudge?.normalizedScore; const contentScore = result.contentJudge?.normalizedScore; if (typeof descriptionScore !== 'number' || !Number.isFinite(descriptionScore) || typeof contentScore !== 'number' || !Number.isFinite(contentScore)) { throw new Error(`Invalid scores from tessl for ${skillDir}: desc=${descriptionScore}, content=${contentScore}`); }
Medium
-
scripts/review_skills.ts:67-73 — JSON extraction fails with trailing stdout content
What's wrong: The code finds the first
{and parses from there to end of stdout. If tessl writes ANY non-whitespace characters after the JSON object,JSON.parsethrows "Unexpected token". The comment acknowledges npx may write LEADING content but doesn't handle TRAILING content.Breaking example: If stdout is:
{"validation":...} Review complete.The parse fails with "Unexpected token 'R'". This causes the skill to fail (not silently pass), so it's not a false-positive risk, but it could cause spurious CI failures.
Suggested fix: Extract just the JSON object by finding matching braces, or use a regex like
/{[\s\S]*}/to extract the JSON portion. -
scripts/review_skills.ts:73 — Type assertion provides no runtime validation
The
as ReviewResultassertion is compile-time only. If the JSON schema changes or tessl returns different fields, the code will fail with cryptic errors or (as noted above) produce NaN scores.Suggested fix: Use Zod or similar for runtime validation of the ReviewResult structure.
Low
-
scripts/review_skills.ts:139 — Error stringification may lose details
${error}on a non-Error object (e.g., if tessl throws a plain object) produces[object Object]. Considererror instanceof Error ? error.message : String(error)for better diagnostics. -
No timeout on subprocess execution
If tessl hangs, the CI job relies entirely on GitHub Actions' job timeout (6 hours by default). Consider adding a timeout to
Deno.Commandfor faster feedback.
Verdict
FAIL — The NaN score bypass (item #1 under Critical/High) allows skills to pass CI validation despite missing or invalid scores from the tessl tool. This defeats the purpose of the skill review gate.
Address adversarial review findings: - Validate normalizedScore fields are finite numbers, throwing on NaN/missing instead of silently passing the threshold check - Extract JSON by matching braces to handle trailing stdout content - Validate validation.overallPassed is a boolean - Use proper error stringification for non-Error thrown values Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Code Review Summary
No blocking issues found. This PR is well-structured and ready to merge.
Review Checklist
Code Quality & TypeScript Strict Mode
scripts/review_skills.tsuses proper TypeScript interfaces (ReviewResult,SkillScore)- Uses
Record<string, unknown>for safe JSON parsing with explicit type guards - No
anytypes used
CLAUDE.md Compliance
- AGPLv3 copyright headers present in all
.tsfiles - Named exports used (
SkillAssets,SkillInfo) - Unit tests in
skill_assets_test.tscover the new reference files
Domain-Driven Design
SkillAssetsappropriately implements the Repository pattern for skill file accessSkillInfois a proper Value Object (identity by value, immutable structure)- The review script is infrastructure/automation code - DDD patterns don't apply directly
Test Coverage
- Tests added for new
swamp-datareference files (data-ownership.md, expressions.md) - Tests added for new
swamp-modelreference files (data-ownership.md, expressions.md) - CI
skill-reviewjob validates the script works correctly
Security
- Command execution uses
Deno.Commandwith array arguments, preventing command injection skillDiris derived from hardcodedBUNDLED_SKILLS, not user input- No path traversal vulnerabilities
Suggestions (non-blocking)
-
JSON extraction robustness: The brace-matching logic to extract JSON from npx stdout is pragmatic for handling noisy output. If npx behavior changes significantly, this could need adjustment, but it's fine for now.
-
Parallelization opportunity: Skills are reviewed sequentially. For faster execution with more skills,
Promise.allcould parallelize the reviews. Not necessary at current scale.
The CI checks (lint, test, format, skill-review) all pass. Good work on improving skill quality assurance!
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
-
scripts/review_skills.ts:74-85 - JSON extraction algorithm doesn't handle braces inside string literals
What's wrong: The brace-matching algorithm naively counts
{and}characters to find JSON boundaries, but doesn't account for braces inside JSON string values.Breaking example: If tessl outputs JSON like:
{"feedback": "Consider using {substitution} syntax", "normalizedScore": 0.9}The algorithm traces:
- Position 0:
{→ depth=1 - Position 23:
{(inside string) → depth=2 - Position 36:
}(inside string) → depth=1 - Position 46:
}(end of object) → depth=0, extract here
But with balanced braces inside the string, it exits at the wrong position. For example,
{"key": "has { and } in it"}would extract only{"key": "has { and }which is malformed JSON.Impact: JSON.parse throws, caught at line 192, skill gets 0 score, CI fails incorrectly. This is a false negative that could block valid PRs.
Likelihood: Depends on tessl's output format. If tessl includes any string fields with braces in feedback/error messages, this triggers. The fields we read (normalizedScore, overallPassed) are numbers/booleans, but if the JSON object contains other string fields with braces, the extraction breaks.
Suggested fix: Either:
- Trust that tessl's
--jsonoutput is valid JSON and just find the last line that starts with{or useJSON.parse(stdout.trim())and catch errors - Implement proper JSON boundary detection that tracks quote state and escape sequences
- Use a regex that finds the outermost object:
/^\{[\s\S]*\}$/m
- Position 0:
Low
-
scripts/review_skills.ts:210 - GITHUB_STEP_SUMMARY write error not caught
If the GITHUB_STEP_SUMMARY file can't be written (permission denied, disk full), the script crashes with an unhandled promise rejection. The error message would come from Deno's filesystem layer which is reasonably clear, but wrapping in try/catch with a clearer message would improve debuggability.
-
.github/workflows/ci.yml:785 - Broader permissions than necessary
--allow-writegrants unrestricted write access. Could be tightened to--allow-write=$GITHUB_STEP_SUMMARYfor defense in depth. Not exploitable in practice since the env var is controlled by GitHub Actions, but follows least-privilege principle.
Verdict
PASS - The JSON extraction bug is a real issue but rated MEDIUM because:
- Failure mode is explicit (parse error caught, skill marked as failed)
- No silent data corruption or security impact
- Depends on tessl output format which may not include problematic strings
- The PR author notes they tested locally with all 8 skills passing at 100%
The skill refactoring (moving content to reference files) and CI integration are straightforward and correct. Tests cover the new reference file registration.
Summary
scripts/review_skills.ts— reviews all bundled skills vianpx tessl skill review, builds a markdown summary table forGITHUB_STEP_SUMMARY, and exits 1 if any skill's average score drops below 90%review-skillstask todeno.jsonskill-reviewCI job (Deno + Node.js) to.github/workflows/ci.ymlneedsarrays forclaude-review,claude-adversarial-review, andauto-mergeto gate onskill-reviewdata-ownership.mdandexpressions.mdreference docs toswamp-dataandswamp-modelskillsskill_assets.tsand update testsTest plan
deno run review-skillslocally — all 8 skills pass at 100%skill-reviewjob passes on this PRtest,deps-audit, and review jobs still pass🤖 Generated with Claude Code