Skip to content

Fix compression detection to use magic bytes instead of file extensions#578

Merged
jklukas merged 1 commit intomainfrom
fix/compression-detection-magic-bytes
Nov 6, 2025
Merged

Fix compression detection to use magic bytes instead of file extensions#578
jklukas merged 1 commit intomainfrom
fix/compression-detection-magic-bytes

Conversation

@jklukas
Copy link
Copy Markdown
Member

@jklukas jklukas commented Nov 6, 2025

When base image layers are cached locally or referenced during image composition, they may be stored with generic names that don't reflect their actual compression format (e.g., 'layer.tar' for a gzip file, or no extension at all).

The previous extension-based detection would incorrectly classify these as uncompressed layers, leading to manifest media type mismatches. When pushing with OCI format, this caused layers to be declared as application/vnd.oci.image.layer.v1.tar (uncompressed) when they were actually gzip-compressed, resulting in 'invalid tar header' errors during docker pull.

This change modifies detect_compression_type() to:

  1. First check file magic bytes (1f 8b for gzip, 28 b5 2f fd for zstd)
  2. Fall back to extension-based detection if magic bytes don't match

This ensures correct media type detection regardless of filename.

Added comprehensive unit tests covering:

  • Correct detection via magic bytes
  • Wrong extension scenarios (the bug case)
  • No extension scenarios
  • Fallback to extension-based detection still works

Fixes issue where layer blobs were gzip-compressed but declared as uncompressed in OCI manifests, causing docker pull failures.

When base image layers are cached locally or referenced during image
composition, they may be stored with generic names that don't reflect
their actual compression format (e.g., 'layer.tar' for a gzip file,
or no extension at all).

The previous extension-based detection would incorrectly classify
these as uncompressed layers, leading to manifest media type mismatches.
When pushing with OCI format, this caused layers to be declared as
application/vnd.oci.image.layer.v1.tar (uncompressed) when they were
actually gzip-compressed, resulting in 'invalid tar header' errors
during docker pull.

This change modifies detect_compression_type() to:
1. First check file magic bytes (1f 8b for gzip, 28 b5 2f fd for zstd)
2. Fall back to extension-based detection if magic bytes don't match

This ensures correct media type detection regardless of filename.

Added comprehensive unit tests covering:
- Correct detection via magic bytes
- Wrong extension scenarios (the bug case)
- No extension scenarios
- Fallback to extension-based detection still works

Fixes issue where layer blobs were gzip-compressed but declared as
uncompressed in OCI manifests, causing docker pull failures.

See: https://github.com/bazeltools/rules_minidock_tools/issues/XXX
@jklukas jklukas force-pushed the fix/compression-detection-magic-bytes branch from 33fd649 to c584325 Compare November 6, 2025 20:59
@jklukas jklukas requested a review from eed3si9n November 6, 2025 21:00
@jklukas jklukas merged commit 66b01aa into main Nov 6, 2025
3 checks passed
@jklukas jklukas deleted the fix/compression-detection-magic-bytes branch November 6, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants