Safactory

A universal sandbox for evaluating agents, collecting trajectories, and training with reinforcement learning across OS, Android, Minecraft, embodied, QA, data-processing, scientific-discovery, and multimodal environments.

✨ Why Safactory

Safactory is an agent sandbox for teams that need one pipeline for evaluation, data generation, and RL training. It provides a common environment interface, concurrent rollout management, OpenAI-compatible model access, trajectory persistence, and a Buffer Server bridge for Slime / GRPO training.

Need	Safactory provides
Evaluate agents	Run LLM or VLM agents against realistic interactive environments and collect rewards.
Build trajectory data	Persist messages, actions, observations, rewards, and environment state to SQLite.
Train with RL	Stream rollout trajectories into Slime through the built-in Buffer Server.
Add new Env	Access new environments through standard interfaces.

Core features:

Multi-domain environments: OS, Android, Minecraft, RoboTrustBench, Embodied ALFRED, QA, DABStep, DiscoveryWorld, DeepEyes, Geo3K-VL, and Math500.
High-concurrency rollouts through environment pools and async workers.
OpenAI-compatible model integration for vLLM, SGLang, hosted APIs, and local proxies.
Local single-machine mode and remote RayJob-backed cluster mode.
Optional experience extraction and prompt-time experience injection.

🎬 Demo

demo.1.mp4

点击播放查看完整演示

🚀 Quick Start

Install

git clone https://github.com/AI45Lab/Safactory.git
cd Safactory
pip install -r requirements.txt

Some environments have extra runtime dependencies. See Supported Environments before running Docker, emulator, VM, or simulator-backed tasks.

Evaluate a model

python launcher.py \
  --env-config env/osgym/os_config.yaml \   # Select the evaluation environment (OS / Android / Minecraft, etc.)
  --llm-base-url http://YOUR_LLM_HOST/v1 \  # Model service address
  --llm-api-key YOUR_API_KEY \              # API Key
  --llm-model YOUR_MODEL \                  # Model name
  --pool-size 500                           # Number of concurrent agent instances

This starts the runner, loads the selected environment configuration, schedules tasks, calls the model endpoint, and writes step-level records to SQLite.

Collect trajectory data

Every rollout is recorded automatically. The default CLI database path is sqlite://env_trajs.db; override it with --db-path:

python launcher.py \
  --env-config env/osgym/os_config.yaml \
  --db-path sqlite://runs/os_eval.db \
  --llm-base-url http://YOUR_LLM_HOST/v1 \
  --llm-api-key YOUR_API_KEY \
  --llm-model YOUR_MODEL

See Data Manager for schema details and query examples.

Train with RL

Safactory integrates with Slime through a Buffer Server:

# Terminal 1: Slime training process
cd rl
./run_slime_generator_vl.sh

# Terminal 2: Safactory Buffer Server and rollout runner
cd rl
./run_buffer_server.sh

Full instructions are in RL Training.

📦 Datasets

Safactory can generate reusable trajectory datasets. The public OS trajectory release is available on Hugging Face:

AI45Research/SATraj-OS, a Safactory-generated OS trajectory dataset for agent training and analysis.

📚 Documentation

Guide	What it covers
Configuration	CLI flags, manager YAML, and environment YAML format.
Supported Environments	Environment registry names, prerequisites, and setup links.
Data Manager	SQLite schema, storage behavior, and query examples.
RL Training	Slime integration, Buffer Server setup, and RL variables.
Custom Environment	Minimal `BaseEnv` implementation and registration flow.
Experience Extraction and Injection	Reusing historical trajectories as prompt-time experience.

🏗️ Architecture

At a high level, launcher.py loads environment YAML files, starts or connects to environment services, sends observations to an OpenAI-compatible model endpoint, records every interaction through the data manager, and optionally forwards completed rollouts to RL training.

🤝 Contributing

Contributions are welcome for new environments, bug fixes, documentation improvements, and reproducible examples.

Fork the repository.
Add or update an environment under env/<name>/.
Include a YAML config and a short README for environment-specific dependencies.
Run a local smoke test with launcher.py.
Open a pull request with the setup notes and expected behavior.

📝 Citation

If Safactory or Safactory-generated datasets are useful in your work, cite the repository and the specific dataset or report you used.

@misc{safactory,
  title = {Safactory: A Universal AI Agent Sandbox for Evaluation, Data Construction, and RL Training},
  howpublished = {\url{https://github.com/AI45Lab/Safactory}},
  year = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
core		core
docs		docs
env		env
evaluator		evaluator
exp_service		exp_service
fig		fig
manager		manager
rl		rl
utils		utils
README.md		README.md
interactor.py		interactor.py
launcher.py		launcher.py
log_setup.py		log_setup.py
rayjob_sdk-0.3.11-py3-none-any.whl		rayjob_sdk-0.3.11-py3-none-any.whl
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Safactory

✨ Why Safactory

🎬 Demo

🚀 Quick Start

Install

Evaluate a model

Collect trajectory data

Train with RL

📦 Datasets

📚 Documentation

🏗️ Architecture

🤝 Contributing

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Safactory

✨ Why Safactory

🎬 Demo

🚀 Quick Start

Install

Evaluate a model

Collect trajectory data

Train with RL

📦 Datasets

📚 Documentation

🏗️ Architecture

🤝 Contributing

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages