Forked from crowd-cast. Extends the capture agent with real-time inference — predicting what the user will do next from screen frames.
crowd-cowork runs side-by-side with crowd-cast. All data directories, config, logs, cache, and tray identity are fully separated. The tray icon is purple-tinted to distinguish it visually.
crowd-cast must be installed first. crowd-cowork reuses the OBS runtime that crowd-cast installs — it does not ship its own copy.
Symlink the OBS runtime from crowd-cast:
mkdir -p ~/Library/Application\ Support/dev.crowd-cowork.agent/obs
ln -s ~/Library/Application\ Support/dev.crowd-cast.agent/obs/current \
~/Library/Application\ Support/dev.crowd-cowork.agent/obs/currentbrew install simde # Required for ARM buildsCROWD_CAST_SKIP_OBS_INSTALL=1 \
CROWD_CAST_API_GATEWAY_URL='https://your-api-gateway.execute-api.region.amazonaws.com/prod/presign' \
CROWD_CAST_SKIP_SPARKLE=1 \
scripts/bundle-macos.sh --debug --skip-signThis produces target/debug/CrowdCowork.app.
open target/debug/CrowdCowork.appOr for a release build:
CROWD_CAST_SKIP_OBS_INSTALL=1 \
CROWD_CAST_API_GATEWAY_URL='https://your-endpoint' \
CROWD_CAST_SKIP_SPARKLE=1 \
scripts/bundle-macos.sh --skip-sign
open target/release/CrowdCowork.appConfig lives at ~/Library/Application Support/dev.crowd-cowork.agent/config.toml (separate from crowd-cast).
[capture]
target_apps = ["org.mozilla.firefox", "com.apple.Terminal"]
capture_all = false
idle_timeout_secs = 120
single_active_app_capture = true
[recording]
autostart_on_launch = true
segment_duration_secs = 300
[upload]
delete_after_upload = trueUpload endpoint is baked in at build time via CROWD_CAST_API_GATEWAY_URL.
┌─────────────────────────────────────────────────────────────────┐
│ crowd-cowork Agent (Rust) │
│ ┌──────────┐ ┌────────────────┐ ┌─────────────────────────┐ │
│ │ Tray UI │ │ Embedded libobs│ │ Sync Engine │ │
│ │ (purple) │ │ (libobs-rs) │ │ - Frontmost app detect │ │
│ └──────────┘ │ │ │ - Input filtering │ │
│ │ ┌───────────┐ │ │ - Event buffering │ │
│ │ │mac-capture│ │ │ - Inference timer │ │
│ │ │obs-x264 │ │ └───────────┬─────────────┘ │
│ │ │obs-ffmpeg │ │ │ │
│ │ └───────────┘ │ ┌─────┴─────┐ │
│ └────────────────┘ │ rdev/evdev│ │
│ │ └───────────┘ │
│ Video Output │
│ │ │
│ ┌─────────┴──────────┐ │
│ │ Frame Tap + Infer │ │
│ └─────────┬──────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ Uploader │ │
│ └─────┬─────┘ │
└────────────────────────┼────────────────────────────────────────┘
│
▼
┌─────────────┐
│ Lambda + S3 │
└─────────────┘
Input logs are stored in MessagePack format. Each file contains an array of [timestamp_us, [event_type, event_data]] tuples:
[0, ["ContextChanged", ["com.apple.Terminal"]]]
[1234000, ["KeyPress", [0, "KeyA"]]]
[1334000, ["KeyRelease", [0, "KeyA"]]]
[1500000, ["MouseMove", [12.5, -3.2]]]
[2000000, ["MousePress", ["Left", 540.0, 320.0]]]
[2100000, ["MouseRelease", ["Left", 540.0, 320.0]]]
[2500000, ["MouseScroll", [0.0, -3.0, 540.0, 320.0]]]
[3999000, ["ContextChanged", ["UNCAPTURED"]]]
Event types:
ContextChanged: app switch (bundle ID orUNCAPTUREDfor untracked apps)KeyPress/KeyRelease:[key_code, key_name]MouseMove:[delta_x, delta_y]MousePress/MouseRelease:[button, x, y]MouseScroll:[delta_x, delta_y, x, y]
Timestamps are microseconds relative to the segment start. Video and input files share the same session/segment IDs for alignment.
Overlay keylogs on top of a screen capture:
python scripts/overlay_keylogs.py --video capture.mp4 --input input.msgpack --output capture_with_keys.mp4To just generate subtitles (ASS):
python scripts/overlay_keylogs.py --input input.msgpack --ass-out keylogs.ass| crowd-cast | crowd-cowork | |
|---|---|---|
| Bundle ID | dev.crowd-cast.agent |
dev.crowd-cowork.agent |
| Tray icon | Original | Purple-tinted |
| Config dir | ~/Library/Application Support/dev.crowd-cast.agent/ |
~/Library/Application Support/dev.crowd-cowork.agent/ |
| Logs | ~/Library/Logs/crowd-cast/ |
~/Library/Logs/crowd-cowork/ |
| OBS runtime | Bundled / auto-installed | Symlinked from crowd-cast |
| Auto-update | Sparkle | Disabled (dev builds) |
| Inference | No | Frame tap + VLM inference overlay |
MIT License, see LICENSE.md
Contributions welcome! Please open an issue first to discuss proposed changes.