A universal sandbox for evaluating agents, collecting trajectories, and training with reinforcement learning across OS, Android, Minecraft, embodied, QA, data-processing, scientific-discovery, and multimodal environments.
Quick Start | Demo | Environments | RL Training | Custom Environments | Configuration | Data | Report
Safactory is an agent sandbox for teams that need one pipeline for evaluation, data generation, and RL training. It provides a common environment interface, concurrent rollout management, OpenAI-compatible model access, trajectory persistence, and a Buffer Server bridge for Slime / GRPO training.
| Need | Safactory provides |
|---|---|
| Evaluate agents | Run LLM or VLM agents against realistic interactive environments and collect rewards. |
| Build trajectory data | Persist messages, actions, observations, rewards, and environment state to SQLite. |
| Train with RL | Stream rollout trajectories into Slime through the built-in Buffer Server. |
| Add new Env | Access new environments through standard interfaces. |
Core features:
- Multi-domain environments: OS, Android, Minecraft, RoboTrustBench, Embodied ALFRED, QA, DABStep, DiscoveryWorld, DeepEyes, Geo3K-VL, and Math500.
- High-concurrency rollouts through environment pools and async workers.
- OpenAI-compatible model integration for vLLM, SGLang, hosted APIs, and local proxies.
- Local single-machine mode and remote RayJob-backed cluster mode.
- Optional experience extraction and prompt-time experience injection.
demo.1.mp4
点击播放查看完整演示
git clone https://github.com/AI45Lab/Safactory.git
cd Safactory
pip install -r requirements.txtSome environments have extra runtime dependencies. See Supported Environments before running Docker, emulator, VM, or simulator-backed tasks.
python launcher.py \
--env-config env/osgym/os_config.yaml \ # Select the evaluation environment (OS / Android / Minecraft, etc.)
--llm-base-url http://YOUR_LLM_HOST/v1 \ # Model service address
--llm-api-key YOUR_API_KEY \ # API Key
--llm-model YOUR_MODEL \ # Model name
--pool-size 500 # Number of concurrent agent instancesThis starts the runner, loads the selected environment configuration, schedules tasks, calls the model endpoint, and writes step-level records to SQLite.
Every rollout is recorded automatically. The default CLI database path is sqlite://env_trajs.db; override it with --db-path:
python launcher.py \
--env-config env/osgym/os_config.yaml \
--db-path sqlite://runs/os_eval.db \
--llm-base-url http://YOUR_LLM_HOST/v1 \
--llm-api-key YOUR_API_KEY \
--llm-model YOUR_MODELSee Data Manager for schema details and query examples.
Safactory integrates with Slime through a Buffer Server:
# Terminal 1: Slime training process
cd rl
./run_slime_generator_vl.sh
# Terminal 2: Safactory Buffer Server and rollout runner
cd rl
./run_buffer_server.shFull instructions are in RL Training.
Safactory can generate reusable trajectory datasets. The public OS trajectory release is available on Hugging Face:
- AI45Research/SATraj-OS, a Safactory-generated OS trajectory dataset for agent training and analysis.
| Guide | What it covers |
|---|---|
| Configuration | CLI flags, manager YAML, and environment YAML format. |
| Supported Environments | Environment registry names, prerequisites, and setup links. |
| Data Manager | SQLite schema, storage behavior, and query examples. |
| RL Training | Slime integration, Buffer Server setup, and RL variables. |
| Custom Environment | Minimal BaseEnv implementation and registration flow. |
| Experience Extraction and Injection | Reusing historical trajectories as prompt-time experience. |
At a high level, launcher.py loads environment YAML files, starts or connects to environment services, sends observations to an OpenAI-compatible model endpoint, records every interaction through the data manager, and optionally forwards completed rollouts to RL training.
Contributions are welcome for new environments, bug fixes, documentation improvements, and reproducible examples.
- Fork the repository.
- Add or update an environment under
env/<name>/. - Include a YAML config and a short README for environment-specific dependencies.
- Run a local smoke test with
launcher.py. - Open a pull request with the setup notes and expected behavior.
If Safactory or Safactory-generated datasets are useful in your work, cite the repository and the specific dataset or report you used.
@misc{safactory,
title = {Safactory: A Universal AI Agent Sandbox for Evaluation, Data Construction, and RL Training},
howpublished = {\url{https://github.com/AI45Lab/Safactory}},
year = {2026}
}