Skip to content

fix: guard against path traversal in extension archive extraction and add driver/datastore tests#746

Merged
stack72 merged 3 commits intomainfrom
fix/extension-pull-path-traversal-and-driver-datastore-tests
Mar 17, 2026
Merged

fix: guard against path traversal in extension archive extraction and add driver/datastore tests#746
stack72 merged 3 commits intomainfrom
fix/extension-pull-path-traversal-and-driver-datastore-tests

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 17, 2026

Summary

Addresses two issues raised in PR review:

  • Path traversal during archive extraction (src/cli/commands/extension_pull.ts): After tar extracts an extension archive to a temp directory, each extracted file path is now resolved and validated to confirm it starts within the temp directory. If any entry contains path traversal sequences (e.g. ../../.bashrc) that would escape the temp dir, a UserError is thrown before any further processing occurs.

  • Missing test coverage for driver/datastore content extraction (src/domain/extensions/extension_content_extractor_test.ts): Added 8 unit tests for extractDriverFromSource and extractDatastoreFromSource following the same patterns as the existing vault tests:

    • Extracts type, name, and description
    • Extracts configSchema fields (inline z.object)
    • Skips files without the relevant export (driver / datastore)
    • Skips exports that are missing the required type field

Test Plan

  • All 3183 existing tests continue to pass (deno run test)
  • 8 new unit tests added and passing for driver/datastore extraction
  • deno check passes (no type errors)
  • deno lint passes (no lint errors)
  • deno fmt --check passes (no formatting issues)

… add driver/datastore tests

- Validate all extracted file paths are within the temp directory after
  tar extraction, throwing a UserError if any entry escapes via path
  traversal sequences
- Add 8 unit tests for extractDriverFromSource and extractDatastoreFromSource
  covering type/name/description extraction, configSchema field parsing,
  files without the relevant export, and exports missing the type field

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

…traversal

Use `tar -tzf` to list archive entries and reject any containing ".." or
starting with "/" before extracting, eliminating the TOCTOU window where
malicious entries could escape tmpDir during extraction.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR properly addresses the two issues from the prior review. The changes are well-implemented and ready to merge.

No Blocking Issues

The code is clean, follows project conventions, and the security fix is correctly placed.

Analysis

Security Fix (extension_pull.ts:593-615)

The path traversal guard is well-implemented:

  • Checks happen BEFORE extraction (the correct time)
  • Rejects paths containing .. (path traversal sequences)
  • Rejects absolute paths starting with /
  • Clear error message identifies the unsafe path
  • Appropriate placement in the CLI layer (input validation at application boundary per DDD principles)

The refactoring to reuse archiveEntries for debug logging is a nice simplification that avoids redundant filesystem operations.

Test Coverage (extension_content_extractor_test.ts)

The 8 new tests for driver/datastore extraction are well-written:

  • Follow existing test patterns in the file
  • Proper cleanup with try/finally blocks
  • Cover positive cases (successful extraction) and negative cases (missing export, missing type)
  • Symmetric coverage for both drivers and datastores

Code Quality

  • TypeScript strict mode: No violations
  • Named exports: Properly used
  • License headers: Present
  • Test file naming: Correctly placed next to source

Suggestions (non-blocking)

  1. Future consideration: A unit test for the path traversal check itself would strengthen confidence in this security fix. This could be done by:

    • Creating a test archive with malicious paths, or
    • Extracting the validation logic into a separate testable function
  2. The entry.includes("..") check is conservative and would reject rare legitimate filenames containing ... This is the right tradeoff for security (false positives > false negatives).


LGTM - approve for merge.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical

  1. Symlink path traversal bypass (src/cli/commands/extension_pull.ts:609-614)

    What's wrong: The path traversal check uses tar -tzf which only outputs entry names, not symlink targets. A malicious archive can contain a symlink like escape_link -> ../../../etc/passwd where:

    • tar -tzf outputs: escape_link
    • The check entry.includes("..") tests "escape_link".includes("..")false
    • The check passes, but extraction creates a symlink pointing outside tmpDir

    Breaking example:

    # Create malicious archive
    ln -s ../../../etc/passwd escape_link
    tar -czf malicious.tar.gz escape_link
    
    # What PR's check sees:
    tar -tzf malicious.tar.gz
    # Output: "escape_link" (no ".." in this string!)
    
    # What verbose listing shows:
    tar -tzvf malicious.tar.gz
    # Output: "lrwxrwxrwx ... escape_link -> ../../../etc/passwd"

    The archive can also include files like escape_link/inject.txt to write through the symlink after it's extracted.

    Suggested fix: Either:

    • Use tar -tzvf and parse the -> target part to check symlink targets
    • Add --no-same-permissions and extract with Deno's built-in tar/untar that can validate symlink targets
    • Use tar's --exclude for symlinks if symlinks aren't needed: tar -xzf archive.tar.gz --exclude='*' --include='*.ts' ... (filter to expected types)
    • Post-extraction: scan extracted directory for symlinks and reject if any exist or point outside

Medium

  1. Debug logging semantic change (src/cli/commands/extension_pull.ts:634-636)

    What changed: The original code logged files AFTER extraction by scanning the disk:

    const allExtractedFiles = await listFiles(extractDir);
    for (const f of allExtractedFiles) {
      logger.debug`Archive contains: ${relative(extractDir, f)}`;
    }

    The new code logs archive entries BEFORE extraction:

    for (const entry of archiveEntries) {
      logger.debug`Archive contains: ${entry}`;
    }

    This changes behavior when:

    • Archive contains directories (now logged, previously skipped by listFiles which only returns files)
    • Archive contains symlinks (now logged as entry name, previously logged as resolved path or not at all)
    • Extraction partially fails (new code logs intended entries, old code logged actual results)

    Not a bug, but could affect debugging. Consider updating the log message to Archive entry: for clarity.

Verdict

FAIL - The symlink bypass is a critical security vulnerability that completely defeats the path traversal mitigation. The check must validate symlink targets, not just entry names.

Add post-extraction recursive scan that rejects any symlink whose resolved
target (computed via readLink + resolve) falls outside tmpDir, closing the
bypass where tar -tzf only shows entry names and not symlink targets.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

This PR properly addresses the security concerns from the previous review.

What was reviewed:

  • Path traversal protection: Well-implemented defense-in-depth with pre-extraction validation (tar -tzf listing) and post-extraction symlink scanning
  • Test coverage: 8 new unit tests for driver/datastore extraction following established patterns
  • TypeScript compliance: No new any types, proper error handling with UserError
  • DDD patterns: Validation logic appropriately placed in CLI layer, tests co-located

Suggestions (non-blocking):

  1. Duplicate JSDoc at lines 381-388 in extension_pull.ts — the comment block for validateNoSymlinkEscape appears twice:

    /**
     * Recursively validates that no symlink under `path` resolves...
     */
    /**
     * Recursively validates that no symlink under `path` resolves...
     */
  2. Minor: The check entry.includes("..") is conservative but could reject legitimate filenames like changelog..md. Consider checking for .. as a path segment instead (e.g., split by / and check for .. elements). The current approach is acceptable for security-sensitive code.

The security implementation is solid — the two-layer approach catches both direct path traversal in entry names and symlink-based escapes.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found. The security fix is architecturally sound.

Medium

  1. Overly broad path traversal check (src/cli/commands/extension_pull.ts:640)

    The check entry.includes("..") will cause false positives for legitimate filenames containing consecutive dots.

    Breaking example: An archive containing a valid file named config..json, v1..2.patch, or README..backup would be rejected with "Archive contains unsafe path" even though these are safe filenames.

    Suggested fix: Check path segments individually:

    const segments = entry.split("/");
    if (segments.some(seg => seg === "..") || entry.startsWith("/")) {
  2. Duplicate docstring (src/cli/commands/extension_pull.ts:381-388)

    The same docstring block appears twice in succession - appears to be a copy-paste error. Not a functional issue but should be cleaned up.

Low

  1. Symlink self-reference edge case (src/cli/commands/extension_pull.ts:399)

    The check !resolvedTarget.startsWith(resolvedTmpDir + "/") would reject a symlink that points to resolvedTmpDir itself (without trailing slash). This is a false positive scenario, not a security bypass. A symlink like foo -> . at the tmpDir root would be rejected even though it doesn't escape.

  2. Hardlinks not validated

    The post-extraction validation only checks symlinks (stat.isSymlink). Tar archives can contain hardlinks which would bypass this check. However, the impact is minimal because:

    • Hardlinks can't point outside the filesystem
    • Hardlinks can't link to directories
    • The pre-extraction `.." check would still catch path traversal in hardlink names

Verdict

PASS - The security fix provides effective defense-in-depth against path traversal attacks. The pre-extraction entry name check catches malicious paths before extraction, and the post-extraction symlink check provides an additional layer of protection. The copyDir function already skips symlinks during the copy phase, making this truly defense-in-depth.

The overly broad `.." check (Medium #1) could cause operational issues with unusual but valid filenames, but is not a security vulnerability. The duplicate docstring should be cleaned up.

The test additions are well-structured and follow existing patterns.

@stack72 stack72 merged commit 96bd357 into main Mar 17, 2026
6 checks passed
@stack72 stack72 deleted the fix/extension-pull-path-traversal-and-driver-datastore-tests branch March 17, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant