Fix compression detection to use magic bytes instead of file extensions#578
Merged
Fix compression detection to use magic bytes instead of file extensions#578
Conversation
When base image layers are cached locally or referenced during image composition, they may be stored with generic names that don't reflect their actual compression format (e.g., 'layer.tar' for a gzip file, or no extension at all). The previous extension-based detection would incorrectly classify these as uncompressed layers, leading to manifest media type mismatches. When pushing with OCI format, this caused layers to be declared as application/vnd.oci.image.layer.v1.tar (uncompressed) when they were actually gzip-compressed, resulting in 'invalid tar header' errors during docker pull. This change modifies detect_compression_type() to: 1. First check file magic bytes (1f 8b for gzip, 28 b5 2f fd for zstd) 2. Fall back to extension-based detection if magic bytes don't match This ensures correct media type detection regardless of filename. Added comprehensive unit tests covering: - Correct detection via magic bytes - Wrong extension scenarios (the bug case) - No extension scenarios - Fallback to extension-based detection still works Fixes issue where layer blobs were gzip-compressed but declared as uncompressed in OCI manifests, causing docker pull failures. See: https://github.com/bazeltools/rules_minidock_tools/issues/XXX
33fd649 to
c584325
Compare
eed3si9n
approved these changes
Nov 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When base image layers are cached locally or referenced during image composition, they may be stored with generic names that don't reflect their actual compression format (e.g., 'layer.tar' for a gzip file, or no extension at all).
The previous extension-based detection would incorrectly classify these as uncompressed layers, leading to manifest media type mismatches. When pushing with OCI format, this caused layers to be declared as application/vnd.oci.image.layer.v1.tar (uncompressed) when they were actually gzip-compressed, resulting in 'invalid tar header' errors during docker pull.
This change modifies detect_compression_type() to:
This ensures correct media type detection regardless of filename.
Added comprehensive unit tests covering:
Fixes issue where layer blobs were gzip-compressed but declared as uncompressed in OCI manifests, causing docker pull failures.