Skip to content

p-doom/tab

 
 

Repository files navigation

p(doom)

crowd-cowork: Inference Overlay for Human Computer Work Trajectories

Forked from crowd-cast. Extends the capture agent with real-time inference — predicting what the user will do next from screen frames.

crowd-cowork runs side-by-side with crowd-cast. All data directories, config, logs, cache, and tray identity are fully separated. The tray icon is purple-tinted to distinguish it visually.

Prerequisites

crowd-cast must be installed first. crowd-cowork reuses the OBS runtime that crowd-cast installs — it does not ship its own copy.

Building

One-time setup

Symlink the OBS runtime from crowd-cast:

mkdir -p ~/Library/Application\ Support/dev.crowd-cowork.agent/obs
ln -s ~/Library/Application\ Support/dev.crowd-cast.agent/obs/current \
      ~/Library/Application\ Support/dev.crowd-cowork.agent/obs/current

macOS (Apple Silicon)

brew install simde  # Required for ARM builds

Build and bundle the app

CROWD_CAST_SKIP_OBS_INSTALL=1 \
CROWD_CAST_API_GATEWAY_URL='https://your-api-gateway.execute-api.region.amazonaws.com/prod/presign' \
CROWD_CAST_SKIP_SPARKLE=1 \
scripts/bundle-macos.sh --debug --skip-sign

This produces target/debug/CrowdCowork.app.

Run

open target/debug/CrowdCowork.app

Or for a release build:

CROWD_CAST_SKIP_OBS_INSTALL=1 \
CROWD_CAST_API_GATEWAY_URL='https://your-endpoint' \
CROWD_CAST_SKIP_SPARKLE=1 \
scripts/bundle-macos.sh --skip-sign

open target/release/CrowdCowork.app

Configuration

Config lives at ~/Library/Application Support/dev.crowd-cowork.agent/config.toml (separate from crowd-cast).

[capture]
target_apps = ["org.mozilla.firefox", "com.apple.Terminal"]
capture_all = false
idle_timeout_secs = 120
single_active_app_capture = true

[recording]
autostart_on_launch = true
segment_duration_secs = 300

[upload]
delete_after_upload = true

Upload endpoint is baked in at build time via CROWD_CAST_API_GATEWAY_URL.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   crowd-cowork Agent (Rust)                     │
│  ┌──────────┐  ┌────────────────┐  ┌─────────────────────────┐  │
│  │ Tray UI  │  │ Embedded libobs│  │      Sync Engine        │  │
│  │ (purple) │  │ (libobs-rs)    │  │  - Frontmost app detect │  │
│  └──────────┘  │                │  │  - Input filtering      │  │
│                │  ┌───────────┐ │  │  - Event buffering      │  │
│                │  │mac-capture│ │  │  - Inference timer      │  │
│                │  │obs-x264   │ │  └───────────┬─────────────┘  │
│                │  │obs-ffmpeg │ │              │                │
│                │  └───────────┘ │        ┌─────┴─────┐          │
│                └────────────────┘        │ rdev/evdev│          │
│                        │                 └───────────┘          │
│                   Video Output                                  │
│                        │                                        │
│              ┌─────────┴──────────┐                              │
│              │ Frame Tap + Infer  │                              │
│              └─────────┬──────────┘                              │
│                        │                                        │
│                  ┌─────┴─────┐                                   │
│                  │ Uploader  │                                   │
│                  └─────┬─────┘                                   │
└────────────────────────┼────────────────────────────────────────┘
                         │
                         ▼
                  ┌─────────────┐
                  │ Lambda + S3 │
                  └─────────────┘

Data Format

Input logs are stored in MessagePack format. Each file contains an array of [timestamp_us, [event_type, event_data]] tuples:

[0,         ["ContextChanged", ["com.apple.Terminal"]]]
[1234000,   ["KeyPress",       [0, "KeyA"]]]
[1334000,   ["KeyRelease",     [0, "KeyA"]]]
[1500000,   ["MouseMove",      [12.5, -3.2]]]
[2000000,   ["MousePress",     ["Left", 540.0, 320.0]]]
[2100000,   ["MouseRelease",   ["Left", 540.0, 320.0]]]
[2500000,   ["MouseScroll",    [0.0, -3.0, 540.0, 320.0]]]
[3999000,   ["ContextChanged", ["UNCAPTURED"]]]

Event types:

  • ContextChanged: app switch (bundle ID or UNCAPTURED for untracked apps)
  • KeyPress / KeyRelease: [key_code, key_name]
  • MouseMove: [delta_x, delta_y]
  • MousePress / MouseRelease: [button, x, y]
  • MouseScroll: [delta_x, delta_y, x, y]

Timestamps are microseconds relative to the segment start. Video and input files share the same session/segment IDs for alignment.

Utilities

Overlay keylogs on top of a screen capture:

python scripts/overlay_keylogs.py --video capture.mp4 --input input.msgpack --output capture_with_keys.mp4

To just generate subtitles (ASS):

python scripts/overlay_keylogs.py --input input.msgpack --ass-out keylogs.ass

Differences from crowd-cast

crowd-cast crowd-cowork
Bundle ID dev.crowd-cast.agent dev.crowd-cowork.agent
Tray icon Original Purple-tinted
Config dir ~/Library/Application Support/dev.crowd-cast.agent/ ~/Library/Application Support/dev.crowd-cowork.agent/
Logs ~/Library/Logs/crowd-cast/ ~/Library/Logs/crowd-cowork/
OBS runtime Bundled / auto-installed Symlinked from crowd-cast
Auto-update Sparkle Disabled (dev builds)
Inference No Frame tap + VLM inference overlay

License

MIT License, see LICENSE.md

Contributing

Contributions welcome! Please open an issue first to discuss proposed changes.

About

Augmenting long-horizon human computer work.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Rust 69.0%
  • Objective-C 20.0%
  • Shell 6.0%
  • Python 4.3%
  • C 0.7%