Run AI coding agents with full autonomy on macOS without giving them your real account.
User isolation + kernel sandbox + firewall + rollback
I built Hazmat because manual approval mode was the worst of both worlds. It broke agent flow, and it still was not a real security boundary.
If an agent gets prompt-injected, runs a poisoned dependency, or follows malicious repo instructions, the important question is not whether it asked politely. The important question is what it can reach.
Hazmat changes that blast radius. The agent runs as a dedicated macOS user inside OS-level containment, not as you.
If you want the shortest path to a first contained session:
brew install dredozubov/tap/hazmat
hazmat init --bootstrap-agent claude
cd your-project
hazmat claudeThat gets you:
- a dedicated
agentmacOS user - a per-session seatbelt policy
- firewall and DNS hardening
- automatic pre-session snapshot and restore
- a session contract that tells you exactly what the agent can touch
If you want to preview before changing anything, run hazmat init --dry-run or hazmat explain.
If you use Codex, OpenCode, or Gemini instead of Claude, start with docs/harnesses.md.
Every session starts with a contract. No hidden widening, no vague "secure mode" label.
hazmat: session
Mode: Native containment
Why this mode: using native containment by default (Docker routing: none)
Project (read-write): /Users/dr/workspace/my-app
Integrations: go
Auto read-only: /Users/dr/go/pkg/mod
Read-only extensions: none
Read-write extensions: none
Service access: none
Pre-session snapshot: on
Snapshot excludes: vendor/
That contract is the product. You can inspect it with hazmat explain before launch, and you can tell at a glance what changed for the session.
--dangerously-skip-permissions is where the real productivity is. It is also where the real blast radius is.
The category keeps proving the point:
- Agents actively reason about escaping. Ona showed Claude Code bypassing its own denylist via
/proc/self/root, then trying to disable bubblewrap when that path was closed. - The CVEs are not hypothetical. Hazmat tracks 16 Claude Code CVEs, including CVE-2025-59536 and CVE-2026-21852.
- Supply chain attacks are fast enough to beat human supervision. The 2026 axios compromise delivered a RAT through a
postinstallhook in about two seconds.
So the design goal is not "make the agent behave." The design goal is "make autonomous failure less catastrophic."
hazmat claude
hazmat codex
hazmat opencode
hazmat exec ./my-agent-loop.sh
hazmat shell| Layer | What it does |
|---|---|
| User isolation | Runs the agent as a dedicated agent macOS user, so your real home directory is structurally out of reach |
| Kernel sandbox | Generates a per-session seatbelt policy with explicit read-write and read-only scope |
| Credential deny | Blocks common secret paths at the kernel level, including credential paths inside agent home |
| Network firewall | Uses pf to block common exfiltration and tunneling protocols |
| DNS blocklist | Redirects known tunnel, paste, and capture domains to localhost |
| Supply chain hardening | Applies conservative defaults such as npm ignore-scripts=true |
| Snapshots and restore | Takes a pre-session Kopia snapshot so you can diff or roll back |
Current state, not aspirational state:
- macOS native containment is the default path. Hazmat ships release artifacts for
darwin/arm64anddarwin/amd64. - Four harnesses are supported in containment. Claude Code, Codex, OpenCode, and Gemini. Details, tested versions, and auth flows live in docs/harnesses.md.
- Docker support is real, but selective. Private-daemon Docker workflows can use Docker Sandbox mode. Shared host-daemon workflows stay code-only by default. See docs/tier3-docker-sandboxes.md and docs/shared-daemon-projects.md.
- Integrations exist for common stacks. Hazmat currently ships built-in integrations for Go, Node, Rust, Python, Java, TLA+, Terraform/OpenTofu, Beads, and more. See docs/integrations.md.
- Repo-local Git hooks have a Hazmat-managed approval path. Repos can declare
pre-commit,commit-msg, andpre-pushin.hazmat/hooks/hooks.yaml; approval, install, drift review, and uninstall flow throughhazmat hooks .... - Core behavior is tested and partially formally verified. The exact proof boundary is explicit in tla/VERIFIED.md. If something is not listed there, do not assume a proof exists.
Hazmat is useful because the boundaries are concrete. That also means the limitations should be concrete.
- macOS only today. Linux is intentionally compile-only until setup and rollback resources are modeled and implemented. See docs/testing.md.
- This is not a total network allowlist. HTTPS exfiltration to a brand-new domain is still not fully solved by Tier 2. See docs/threat-matrix.md.
- The DNS blocklist is exact-domain, not wildcard. It is based on
/etc/hosts, not a full DNS filtering stack. See docs/design-assumptions.md. - Shared
/tmpstays shared. Hazmat does not pretend macOS temp space suddenly became private. - MCP env inheritance and
SSH_AUTH_SOCKabuse are still category-wide problems. Some of the hardest issues here are operational, not just architectural. They are called out directly in docs/threat-matrix.md. - Docker Sandbox mode now covers every harness entrypoint.
hazmat claude,hazmat codex,hazmat opencode,hazmat gemini,hazmat shell, andhazmat execcan all route into private-daemon Docker Sandbox sessions when the repo needs it.
If you are dealing with hostile repos, long unattended runs, or shared-daemon Docker workflows, the honest answer may be Tier 4, not stretching Tier 2 past what it does well. Start with docs/overview.md.
I want community help here, but I do not want to pretend every part of Hazmat is equally easy or equally safe to crowdsource.
- Integrations and stack coverage - new manifests, detection fixes, better snapshot excludes, compatibility reports
- Harness usability - bootstrap friction, auth/import bugs, first-run UX, docs for real setups
- Docs and onboarding - quickstart clarity, explain-mode examples, screenshots, diagrams, troubleshooting
- Research and evidence - CVE tracking, incident writeups, comparative analysis, drift checks
- Test matrix expansion - real repo validation, macOS version coverage, harness regression repros
- Seatbelt policy changes
pffirewall behavior- setup / rollback ordering
- credential delivery and capability brokering
- anything covered by the TLA+ governance rules
If you want to contribute, CONTRIBUTING.md is the starting point. If you want to understand which claims are modeled versus just tested, read tla/VERIFIED.md and docs/design-assumptions.md first.
# Claude Code
hazmat claude
hazmat claude -p "refactor the auth module"
# Other supported harnesses
hazmat codex
hazmat opencode
hazmat gemini
# Any command in containment
hazmat exec -- make test
hazmat exec -- /bin/zsh -lc 'uv run pytest -q'
# Interactive shell
hazmat shell
# Review and recovery
hazmat diff
hazmat snapshots
hazmat restoreYou can expose more paths explicitly when you need them:
hazmat claude -R ~/reference-docs
hazmat claude -W ~/.venvs/my-app
hazmat config access add -C ~/workspace/my-app --read ~/reference-docs --write ~/.venvs/my-appAnd if the repo needs integration hints:
hazmat integration list
hazmat integration show node
hazmat claude --integration node
hazmat config set integrations.pin "~/workspace/my-app:node,go"
# Repo-local Git hooks
hazmat hooks status
hazmat hooks install
hazmat hooks review
hazmat hooks uninstall You (dr) Agent (agent)
-------- -------------
~/ /Users/agent/
~/.ssh, ~/.aws <- denied -> ~/.claude/
~/workspace/ <- shared -> ~/workspace/ (symlink)
hazmat claude
|
|- snapshot project (Kopia)
|- generate per-session seatbelt policy
|- sudo -u agent hazmat-launch <policy>
| |- apply sandbox-exec
| `- exec harness
|
`- pf firewall already active
The important property is structural separation. The agent is not "forbidden from reading your SSH key while still running as you." It runs as a different user entirely.
| Doc | Why you would read it |
|---|---|
| docs/usage.md | Full user guide once you are past the first session |
| docs/overview.md | Which tier to use, and when |
| docs/threat-matrix.md | Risk-by-risk coverage and documented caveats |
| docs/harnesses.md | Harness setup matrix for Claude, Codex, OpenCode, Gemini |
| docs/integrations.md | How integrations work, and what they are not allowed to do |
| docs/integration-contributor-flow.md | How users discover integrations and turn missing stack support into PR-shaped work |
| docs/integration-author-kit.md | How to propose integrations without turning them into policy escapes |
| docs/community.md | Support tiers, ownership model, sponsor lanes, and contribution surfaces |
| docs/public-roadmap.md | Curated public roadmap exported from beads issues |
| docs/compatibility.md | Compatibility status meanings, matrix shape, and reporting flow |
| docs/recipes/README.md | Community-expandable recipes for common harness + stack workflows |
| docs/testing.md | What is tested locally, in CI, and in destructive VM-backed flows |
| docs/git-hooks.md | Why Hazmat's repo-local hook flow is stricter than plain Git hooks |
| docs/manual-testing.md | Human-driven verification checklist (run before releases / after harness or seatbelt changes) |
| docs/design-assumptions.md | Non-obvious design decisions and known tradeoffs |
| docs/cve-audit.md | How Hazmat maps against known Claude Code CVEs |
| tla/VERIFIED.md | Exact formal verification scope and governance rules |
How I Made --dangerously-skip-permissions Safe in Claude Code
If you find a containment bypass, credential leak, sandbox escape, or other security issue, please use the private reporting path in SECURITY.md.
MIT
The Simpsons and all related characters are property of 20th Television and The Walt Disney Company. The Claude logo is property of Anthropic. We do not claim any rights to these properties. Their use here is purely for entertainment purposes.