Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# pymini

`pymini` minifies Python source code by simplifying syntax, shortening identifiers, and stripping unnecessary whitespace. It supports single-file input and small groups of related modules.
`pymini` minifies Python source code by simplifying syntax, shortening identifiers, and stripping unnecessary whitespace. Its primary multi-file workflow preserves package structure; one-file bundling is available as an explicit opt-in.

## Status

Expand All @@ -14,25 +14,31 @@ python3 -m pip install pymini

## CLI

Minify a single file, a directory, or a glob:
Package mode is the default and preserves the package tree:

```bash
pymini "src/**/*.py" -o out
pymini package src -o out
```

If you need module names and top-level public symbols to remain stable, keep them explicitly:
Legacy invocation without an explicit mode still defaults to `package`:

```bash
pymini src --keep-module-names --keep-global-variables -o out
pymini src -o out
```

Create a single bundled output file:
By default, `pymini` preserves module paths and public globals. When possible, it keeps the public surface stable by emitting aliases while still shortening internal names. To trade API stability for more aggressive compression:

```bash
pymini src --single-file -o out/bundle.py
pymini package src --rename-global-variables -o out
```

Without `--keep-module-names`, output filenames may also be shortened as part of the minification pass.
Bundle mode emits a single file and is better suited to app-style graphs than libraries:

```bash
pymini bundle src -o out/bundle.py
```

The legacy `--single-file` flag is still accepted as a compatibility alias for bundle mode.

## Python API

Expand Down
64 changes: 54 additions & 10 deletions pymini/cli.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,64 @@
import glob
from argparse import ArgumentParser
import sys
from argparse import ArgumentParser, SUPPRESS
from pathlib import Path
from typing import Iterable, Optional, Sequence

from pymini import __version__
from pymini.pymini import minify


PACKAGE_MODE = "package"
BUNDLE_MODE = "bundle"
MODES = {PACKAGE_MODE, BUNDLE_MODE}


def build_parser() -> ArgumentParser:
parser = ArgumentParser(prog="pymini")
parser.add_argument(
"mode",
choices=sorted(MODES),
help="Output mode: preserve a package tree or bundle everything into one file.",
)
parser.add_argument('path', help='Path to the file or directory to minify')
parser.add_argument('--keep-module-names', action='store_true', help='Keep module names as they are. Useful for compressing libraries')
parser.add_argument('--keep-global-variables', action='store_true', help='Keep global variables as they are. Useful for compressing libraries')
parser.add_argument('--single-file', action='store_true', help='Concatenate all outputs into a single file')
parser.add_argument(
'--rename-modules',
action='store_true',
help='Allow module names to be shortened when the selected mode supports it.',
)
parser.add_argument(
'--rename-global-variables',
action='store_true',
help='Rename top-level globals instead of preserving them through public aliases.',
)
parser.add_argument('--single-file', action='store_true', help=SUPPRESS)
parser.add_argument('-o', '--output', help='Path to the output directory', default='./')
parser.add_argument('--version', action='version', version=f'%(prog)s {__version__}')
return parser


def normalize_argv(argv: Optional[Sequence[str]]) -> list[str]:
args = list(sys.argv[1:] if argv is None else argv)
if not args:
return args
if args[0] in MODES:
return args
Comment on lines +44 to +45
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disambiguate legacy paths named package or bundle

The legacy-compatibility path injection now treats a first positional token equal to package/bundle as an explicit mode, so pymini package -o out (where package is the source directory) fails with a missing path argument instead of processing that directory. This is a behavioral regression from the previous CLI for users with those common directory names unless they rewrite invocations (e.g., ./package or explicit mode + path).

Useful? React with 👍 / 👎.

if args[0].startswith("-"):
return [PACKAGE_MODE, *args]
return [PACKAGE_MODE, *args]


def effective_mode(args) -> str:
return BUNDLE_MODE if args.single_file else args.mode


def resolve_options(args) -> tuple[str, bool, bool, bool]:
mode = effective_mode(args)
keep_module_names = not args.rename_modules
keep_global_variables = not args.rename_global_variables
return mode, keep_module_names, keep_global_variables, mode == BUNDLE_MODE


def resolve_python_files(path: str) -> tuple[list[Path], Optional[Path]]:
candidate = Path(path)
if candidate.is_file():
Expand Down Expand Up @@ -101,7 +142,8 @@ def write_outputs(

def main(argv: Optional[Sequence[str]] = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
args = parser.parse_args(normalize_argv(argv))
mode, keep_module_names, keep_global_variables, output_single_file = resolve_options(args)
paths, module_root = resolve_python_files(args.path)
if not paths:
parser.error(f"no Python files matched {args.path!r}")
Expand All @@ -112,17 +154,19 @@ def main(argv: Optional[Sequence[str]] = None) -> int:
except ValueError as exc:
parser.error(str(exc))
cleaned, modules = minify(
sources, modules, keep_module_names=args.keep_module_names,
keep_global_variables=args.keep_global_variables,
output_single_file=args.single_file
sources,
modules,
keep_module_names=keep_module_names,
keep_global_variables=keep_global_variables,
output_single_file=output_single_file,
)
try:
write_outputs(
cleaned,
modules,
Path(args.output),
single_file=args.single_file,
keep_module_names=args.keep_module_names,
single_file=output_single_file,
keep_module_names=keep_module_names,
module_to_output_path=module_to_output_path,
)
except ValueError as exc:
Expand Down
93 changes: 65 additions & 28 deletions pymini/pymini.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ def __init__(self, generator, mapping=None, modules=(), keep_global_variables=Fa
self.generator = generator
self.name_to_node = {}
self.nodes_to_insert = []
self.nodes_to_append = []
# TODO: cleanup
self.str_name_to_node = {}
self.str_mapping = {}
Expand All @@ -167,6 +168,15 @@ def _is_node_global(self, node):
not hasattr(node, 'parent') or isinstance(node.parent, ast.Module)
)

def _rename_identifier(self, old_name):
if old_name not in self.mapping.values():
self.mapping[old_name] = next(self.generator)
return self.mapping[old_name]

def _append_public_alias(self, old_name, new_name):
if old_name != new_name:
self.nodes_to_append.append(ast.parse(f"{old_name} = {new_name}").body[0])

def _visit_ImportOrImportFrom(self, node):
"""Shorten imported library names.

Expand All @@ -189,6 +199,8 @@ def _visit_ImportOrImportFrom(self, node):
import donotaliasme
from donotaliasme import dolor
"""
if self.keep_global_variables and self._is_node_global(node):
return self.generic_visit(node)
if isinstance(node, ast.Import) or node.module not in self.modules:
for alias in node.names:
if isinstance(node, ast.ImportFrom) or alias.name not in self.modules:
Expand All @@ -208,12 +220,22 @@ def visit_ClassDef(self, node):
>>> apply('class Demiurgic: pass\\nholy = Demiurgic()')
'class a:\\n pass\\nb = a()'
>>> shortener = VariableShortener(variable_name_generator(), keep_global_variables=True)
>>> def apply(src):
... tree = ast.parse(src)
... shortener.visit(tree)
... append_public_aliases(tree, shortener.nodes_to_append)
... return ast.unparse(tree)
...
>>> apply('class Demiurgic: pass\\nholy = Demiurgic()')
'class Demiurgic:\\n pass\\nholy = Demiurgic()'
'class a:\\n pass\\nholy = a()\\nDemiurgic = a'
"""
if node.name not in self.mapping.values() and not ( # TODO: make .values() more efficient
self.keep_global_variables and self._is_node_global(node)
): # TODO: rename but insert var def if worth it
if self.keep_global_variables and self._is_node_global(node):
if len(node.name) > 1 and node.name not in self.mapping.values():
old_name = node.name
node.name = self._rename_identifier(old_name)
self._append_public_alias(old_name, node.name)
return self.generic_visit(node)
if node.name not in self.mapping.values(): # TODO: make .values() more efficient
self.mapping[node.name] = node.name = next(self.generator)
return self.generic_visit(node)

Expand All @@ -225,13 +247,23 @@ def visit_FunctionDef(self, node):
>>> apply('def demiurgic(palpitation): return palpitation\\nholy = demiurgic()')
'def b(a):\\n return a\\nc = b()'
>>> shortener = VariableShortener(variable_name_generator(), keep_global_variables=True)
>>> def apply(src):
... tree = ast.parse(src)
... shortener.visit(tree)
... append_public_aliases(tree, shortener.nodes_to_append)
... return ast.unparse(tree)
...
>>> apply('def demiurgic(palpitation): return palpitation\\nholy = demiurgic()')
'def demiurgic(a):\\n return a\\nholy = demiurgic()'
'def b(a):\\n return a\\nholy = b()\\ndemiurgic = b'
"""
for arg in node.args.args + [node.args.vararg, node.args.kwarg]:
if arg is not None and arg.arg not in self.mapping.values(): # TODO: make .values() more efficient
self.mapping[arg.arg] = arg.arg = next(self.generator)
if self.keep_global_variables and self._is_node_global(node): # TODO: rename but insert var def if worth it
if self.keep_global_variables and self._is_node_global(node):
if len(node.name) > 1 and node.name not in self.mapping.values():
old_name = node.name
node.name = self._rename_identifier(old_name)
self._append_public_alias(old_name, node.name)
return self.generic_visit(node)
if node.name not in self.mapping.values(): # TODO: need to dedup this logic
self.mapping[node.name] = node.name = next(self.generator)
Expand Down Expand Up @@ -289,10 +321,12 @@ def visit_Name(self, node):
"""
if node.id in self.mapping.values(): # TODO: make .values() more efficient
return node
if self.keep_global_variables and self._is_node_global(node):
if node.id in self.mapping:
node.id = self.mapping[node.id]
return self.generic_visit(node)
if node.id in self.mapping:
node.id = self.mapping[node.id]
elif self.keep_global_variables and self._is_node_global(node): # TODO: rename but insert var def if worth it # TODO: this optimization should only apply to var def
return self.generic_visit(node)
elif node.id in self.name_to_node:
self.mapping[node.id] = new_variable_name = next(self.generator)
self.nodes_to_insert.append(ast.parse(f'{new_variable_name} = {node.id}').body[0])
Expand Down Expand Up @@ -365,6 +399,7 @@ def transform(self, *trees):
for module, tree in zip(self.modules, trees):
self.module_to_shortener[module].transform(tree)
define_custom_variables(tree, self.module_to_shortener[module].nodes_to_insert)
append_public_aliases(tree, self.module_to_shortener[module].nodes_to_append)
return trees


Expand All @@ -386,33 +421,28 @@ def __init__(self, generator, modules, module_to_shortener, keep_module_names=Fa
self.keep_module_names = keep_module_names

def transform(self, *trees):
if self.keep_module_names:
return trees

# shorten module names
module_to_module = {module: next(self.generator) for module in self.modules}
original_modules = list(self.module_to_shortener)
module_to_module = {}
if not self.keep_module_names:
module_to_module = {module: next(self.generator) for module in original_modules}

# NOTE: Must modify in-place, as this list is passed to Fuser
for i, module in enumerate(self.modules):
self.modules[i] = module_to_module[module]
# NOTE: Must modify in-place, as this list is passed to Fuser
for i, module in enumerate(original_modules):
self.modules[i] = module_to_module[module]

new_trees = [] # TODO: cleanup
for tree, module in zip(trees, module_to_module):

# rerun shortening on ea file based on imports from *other files
fused_mapping = {}
for _module, shortener in self.module_to_shortener.items():
if _module != module:
fused_mapping.update(shortener.mapping)
else:
# HACK: identity needed, so that we don't rename variables
# *again. TODO: figure out why single-char variables are
# being renamed
fused_mapping.update({v: v for v in shortener.mapping.values()})
for tree, module in zip(trees, original_modules):
# Preserve names already shortened in this module, and only rewrite
# imported references using the exporter module's mapping.
fused_mapping = {
value: value
for value in self.module_to_shortener[module].mapping.values()
}

imported = ImportedVariableShortener(
self.generator,
mapping=fused_mapping,
keep_global_variables=True,
module_to_module={_module: value for _module, value in module_to_module.items() if module != _module},
module_to_shortener={_module: value for _module, value in self.module_to_shortener.items() if module != _module},
)
Expand Down Expand Up @@ -523,6 +553,13 @@ def define_custom_variables(tree, mapping):
ast.fix_missing_locations(tree)


def append_public_aliases(tree, aliases):
root = next(ast.walk(tree))
for node in aliases:
root.body.append(ast.copy_location(node, root))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Insert public aliases before import-time API discovery

Appending aliases at module end changes import-time semantics for modules that inspect globals() before EOF (for example dynamic __all__ construction). After top-level defs are renamed, append_public_aliases adds old_name = new_name only at the end, so code executed earlier sees only the minified name and can publish the wrong API surface (e.g., star exports missing the original public symbol), which violates the new "preserve public globals" default.

Useful? React with 👍 / 👎.

ast.fix_missing_locations(tree)


class Unparser:

def transform(self, *trees):
Expand Down
Loading
Loading