A Python library for structured pruning, and Bias visualization, of large language models, with a focus on GLU architectures and fairness analysis.
OptiPFair enables efficient pruning of large language models while maintaining their performance. It implements various structured pruning methods, starting with MLP pruning for GLU architectures (as used in models like LLaMA, Mistral, etc.).
Key features:
- GLU architecture-aware pruning that preserves model structure
- Multiple neuron importance calculation methods
- Support for both pruning percentage and target expansion rate
- Simple Python API and CLI interface
- Progress tracking and detailed statistics
- NEW: Bias visualization tools to analyze and understand fairness issues
- NEW: LLM Reference manual for pair programming
The file optipfair_llm_reference_manual.md is included in this repository to serve as a comprehensive reference for OptiPFair when working with large language models (LLMs). You can add this file to your prompt or project to provide LLMs with detailed context about OptiPFair’s API, features, and usage patterns. This is especially useful for facilitating effective pair programming with LLMs and OptiPFair, enabling more accurate code generation, troubleshooting, and integration.
# From PyPI
pip install optipfair
# From source
git clone https://github.com/yourusername/optipfair.git
cd optipfair
pip install -e .from transformers import AutoModelForCausalLM
from optipfair import prune_model
# Load a pre-trained model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
# Prune 20% of neurons from MLP layers using MAW method
pruned_model, stats = prune_model(
model=model,
pruning_type="MLP_GLU",
neuron_selection_method="MAW",
pruning_percentage=20,
show_progress=True,
return_stats=True
)
# Print pruning statistics
print(f"Original parameters: {stats['original_parameters']:,}")
print(f"Pruned parameters: {stats['pruned_parameters']:,}")
print(f"Reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)")
# Save pruned model
pruned_model.save_pretrained("./pruned-llama-model")from transformers import AutoModelForCausalLM, AutoTokenizer
from optipfair.bias import visualize_bias
# Load model and tokenizer
model_name = "meta-llama/Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Define prompt pairs that differ in demographic attributes
prompt_pairs = [
("The white man walked through the neighborhood. The police officer thought he",
"The Black man walked through the neighborhood. The police officer thought he")
]
# Generate visualizations and get bias metrics
_, metrics = visualize_bias(
model,
tokenizer,
prompt_pairs=prompt_pairs,
visualization_types=["mean_diff", "pca"],
layers="first_middle_last",
output_dir="./bias_analysis"
)
# Print overall bias metrics
overall = metrics["pair_1"]["metrics"]["overall_metrics"]
print(f"Mean activation difference: {overall['mean_difference']:.6f}")Visualizations obtained with optiPfair analyzing the LLaMA 2.3-1B model for race bias.
# Prune a model with default settings (10% pruning, MAW method)
optipfair prune --model-path meta-llama/Llama-3.2-1B --output-path ./pruned-model
# Prune with custom settings
optipfair prune \
--model-path meta-llama/Llama-3.2-1B \
--pruning-type MLP_GLU \
--method MAW \
--pruning-percentage 20 \
--output-path ./pruned-model
# Use expansion rate instead of pruning percentage
optipfair prune \
--model-path meta-llama/Llama-3.2-1B \
--expansion-rate 140 \
--output-path ./pruned-model
# Analyze a model's architecture
optipfair analyze --model-path meta-llama/Llama-3.2-1BOptiPFair supports three methods for calculating neuron importance:
-
MAW (Maximum Absolute Weight) - Default method that identifies influential neurons based on the magnitude of their connections. Typically provides the best pruning results.
-
VOW (Variance of Weights) - Identifies neurons based on the variance of their weights. May be useful for specific architectures.
-
PON (Product of Norms) - Uses the product of L1 norms to identify important neurons. This method may be applicable in certain contexts.
Complete documentation is available at https://peremartra.github.io/optipfair/.
At his moment, OptiPFair is designed to work with transformer-based language models that use GLU architecture in their MLP layers, including:
- LLaMA family (LLaMA, LLaMA-2, LLaMA-3, )
- Mistral models, QWeN, Gemma...
- And other models with similar GLU architecture
OptiPFair supports two ways to specify the pruning target:
-
Pruning Percentage - Directly specify what percentage of neurons to remove (e.g., 20%)
-
Expansion Rate - Specify the target expansion rate (ratio of intermediate size to hidden size) as a percentage (e.g., 140% instead of the default 400%)
The expansion rate approach is often more intuitive when comparing across different model scales.
- Support for attention layer pruning
- Whole block pruning
- Integrated evaluation benchmarks
- Bias visualizations.
If you use OptiPFair in your research, please cite:
@software{optipfair2025,
author = {Pere Martra},
title = {OptiPFair: A Library for Structured Pruning of Large Language Models},
year = {2025},
url = {https://github.com/peremartra/optipfair}
}
Apache 2.0
Contributions are welcome! Please check out our contributing guidelines for details.