Computer-use agent inference harness. Screenshot → Qwen3-VL → parsed tool call → dispatched as OS input. macOS only (for now).
The artifact is an .app bundle (work.app) that wraps a single CLI
binary. Modules inside (model, input, screen, qwen3vl, log, error)
are internal — they back main but aren't an external library surface.
./scripts/bundle.sh
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"
# drag target/bundle/work.app into the list, toggle ONCFBundleIdentifier = org.pdoom.work is stable across rebuilds, so the
Accessibility grant persists.
MODEL_ENDPOINT=http://localhost:8000 \
MODEL_NAME="Qwen/Qwen3-VL-30B-A3B-Instruct" \
./target/bundle/work.app/Contents/MacOS/work "Click the bluetooth icon."Env:
MODEL_ENDPOINTrequired; no trailing slashMODEL_NAMErequiredMODEL_API_KEYoptional; sent as bearer if setWORK_SESSIONSoptional; where to write session dirs (default:./sessions)
Session events.jsonl + frames/ land under WORK_SESSIONS/<timestamp>/.
cargo test # offline unit tests
cargo test -- --include-ignored # adds live-endpoint tests (MODEL_ENDPOINT + MODEL_NAME required)Edit src/main.rs (or any module), rerun ./scripts/bundle.sh, rerun the
binary. No re-granting.