ARES documentation
Welcome to ARES – AI Robustness Evaluation System. ARES is a framework developed by IBM Research to support automated red-teaming of AI systems. It helps researchers and developers evaluate robustness of AI applications through modular, extensible components.
Red-team AI systems with ease using 4 key components: target, goals, strategy, and evaluation. Test your AI builds with targeted metrics for security and reliability.
🚀 What Can ARES Do?
ARES enables:
🧠 Automated Red-Teaming: Simulate adversarial scenarios to test AI models.
🛠️ Intent Definition: Create structured red-teaming exercises using YAML configuration.
🔌 Plugin Integration: Extend functionality with custom plugins for attacks, goals (datasets), models, and metrics.
🧬 Custom Pipelines: Chain together plugins to build complex evaluation workflows.
🔗 Plugin Ecosystem
ARES includes a rich set of plugins:
🎯 Attack Plugins: Prompt injection, encoding, gradient-based, and multi-turn attacks.
🔗 Connector Plugins: Interface with models like HuggingFace transformers, OpenAI APIs, and local deployments.
🎓 Goals Plugins: Load and preprocess datasets from HuggingFace, local files, or custom sources.
📊 Evaluation Plugins: Assess model responses using keyform or model-as-a-judge approaches, measuring accuracy, toxicity, privacy leakage, and more.
📦 Each plugin is self-contained. You need to install plugins individually before using them in ARES.
🧭 What’s Next
ARES is evolving to support:
🛡️ OWASP Mapping Intents: Align red-teaming scenarios with OWASP AI Security guidelines.
🧪 Expanded Plugin Library: More attack types, model integrations, and evaluation metrics.
🤝 Community Contributions: Support for external plugin repositories and shared scenarios.
📈 Visualization Tools: Dashboards and reports for scenario outcomes.
📑 Advanced Reporting: Human-readable summaries and machine-readable outputs (JSON, CSV) for downstream analysis.
💙 IBM ❤️ Open Source AI
ARES has been brought to you by IBM.
Contents:
- Getting Started with ARES
- ARES Usage
- CLI Reference
- ARES Configuration
- ARES Plugins
- ARES Strategies
- Evaluators Reference
- Overview
- Keyword Evaluator
- LLM-Based Evaluators
- Garak Detectors
- Crescendo Evaluator
- OWASP-Specific Evaluators
- LLM01 Evaluator (Prompt Injection)
- LLM02 Evaluator (Sensitive Information Disclosure)
- LLM04 Evaluator (Data and Model Poisoning)
- LLM05 Evaluator (Improper Output Handling)
- LLM06 Evaluator (Excessive Agency)
- LLM07 Evaluator (System Prompt Leakage)
- LLM09 Evaluator (Misinformation)
- LLM10 Evaluator (Unbounded Consumption)
- Multiple Evaluators
- Custom Evaluators
- Viewing Available Evaluators
- API Reference
- OWASP Mapping: ARES Intents
- FAQ