Skip to content
Dagu

The command-native workflow engine AI agents need.

Turn scripts, containers, and agent CLIs into scheduled, observable, retryable workflows. One self-hosted binary. No database, no broker, no SDK rewrite.

Try It Live

Explore without installing: Live Demo

Credentials: demouser / demouser

What Dagu Does

Dagu is a command-native workflow engine that runs as a single binary with no external databases or message brokers. It turns scripts, commands, containers, server tasks, and agent CLIs into DAGs (Directed Acyclic Graphs) defined in YAML. It supports local execution, cron scheduling, queue-based concurrency control, and distributed coordinator/worker execution across multiple machines over gRPC.

All state is stored in local files by default. There is nothing to install besides the binary itself.

Real-World Use Cases

Dagu is useful when scripts, containers, server jobs, data tasks, or agent-driven work need visible dependencies, schedules, logs, retries, and a simple way to operate them.

Cron and Legacy Script Management

Run: existing shell scripts, Python scripts, HTTP calls, and scheduled jobs without rewriting them.

Why Dagu fits: dependencies, run status, logs, retries, and history become visible in the Web UI instead of being hidden across crontabs and server log files.

ETL and Data Operations

Run: PostgreSQL or SQLite queries, S3 transfers, jq transforms, validation steps, and reusable sub-workflows.

Why Dagu fits: daily data workflows stay declarative, observable, and easy to retry when one step fails.

Media Conversion

Run: ffmpeg, thumbnail extraction, audio normalization, image processing, and other compute-heavy jobs.

Why Dagu fits: conversion work can run across distributed workers while status, history, logs, and artifacts stay in one persistence layer for monitoring, debugging, and retries.

Infrastructure and Server Automation

Run: SSH backups, cleanup jobs, deploy scripts, patch windows, precondition checks, and lifecycle hooks.

Why Dagu fits: remote operations get schedules, retries, notifications, and per-step logs without requiring operators to SSH into servers for every recovery.

Container and Kubernetes Workflows

Run: Docker images, Kubernetes Jobs, shell glue, and follow-up validation steps.

Why Dagu fits: teams can compose image-based tasks and route them to the right workers without building a custom control plane.

Customer Support Automation

Run: diagnostics, account repair jobs, data checks, and approval-gated support actions.

Why Dagu fits: non-engineers can run reviewed workflows from the Web UI while engineers keep commands, logs, and results traceable.

IoT and Edge Workflows

Run: sensor polling, local cleanup, offline sync, health checks, and device maintenance jobs.

Why Dagu fits: the single binary and file-backed state work well on small devices while still providing visibility through the Web UI.

AI Agent Workflows

Run: AI coding agents, agent CLIs, agent-authored YAML workflows, log analysis, repair steps, and human-reviewed automation.

Why Dagu fits: workflows are commands plus plain YAML, so agents can create and debug them while humans keep dependencies, logs, approvals, and run history in one place.

Architecture

Dagu runs in three configurations:

Standalone. A single dagu start-all process runs the HTTP server, scheduler, and executor. Suitable for single-machine deployments.

Coordinator/Worker. The scheduler enqueues jobs to a file-based queue, then dispatches them to a coordinator over gRPC. Workers long-poll the coordinator for tasks, execute DAGs locally, and report status back. Workers can run on separate machines and are routed tasks based on labels. Mutual TLS secures gRPC communication between coordinator and workers.

Headless. Run without the web UI (DAGU_HEADLESS=true). Useful for CI/CD environments or when Dagu is managed through the CLI or API only.

Standalone:

  ┌─────────────────────────────────────────┐
  │  dagu start-all                         │
  │  ┌───────────┐ ┌───────────┐ ┌────────┐│
  │  │ HTTP / UI │ │ Scheduler │ │Executor││
  │  └───────────┘ └───────────┘ └────────┘│
  │  File-based storage (logs, state, queue)│
  └─────────────────────────────────────────┘

Distributed:

  ┌────────────┐                   ┌────────────┐
  │ Scheduler  │                   │ HTTP / UI  │
  │            │                   │            │
  │ ┌────────┐ │                   └─────┬──────┘
  │ │ Queue  │ │  Dispatch (gRPC)        │
  │ │(file)  │ │─────────┐               │
  │ └────────┘ │         │               │
  └────────────┘         ▼               ▼
                    ┌─────────────────────────┐
                    │      Coordinator        │
                    │  (gRPC task dispatch,   │
                    │   worker registry,      │
                    │   health monitoring)    │
                    └────────▲────────────────┘

                   Worker poll / task response
                   Heartbeat / ReportStatus /
                   StreamLogs (gRPC)

               ┌─────────────┴─────────────┐
               │             │             │
          ┌────┴───┐    ┌────┴───┐    ┌────┴───┐
          │Worker 1│    │Worker 2│    │Worker N│
          └────────┘    └────────┘    └────────┘

Quick Start

Install

bash
curl -fsSL https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.sh | bash
powershell
irm https://raw.githubusercontent.com/dagucloud/dagu/main/scripts/installer.ps1 | iex
bash
docker run --rm -v ~/.dagu:/var/lib/dagu -p 8080:8080 ghcr.io/dagucloud/dagu:latest dagu start-all
bash
brew install dagu
bash
helm repo add dagu https://dagucloud.github.io/dagu
helm repo update
helm install dagu dagu/dagu --set persistence.storageClass=<your-rwx-storage-class>

The script installers run a guided wizard that installs Dagu, adds it to your PATH, sets up a background service, and creates the initial admin account. Homebrew, Docker, and Helm install without the wizard. See the Installation Guide for all options.

Create and Run a Workflow

bash
cat > hello.yaml << 'EOF'
steps:
  - command: echo "Hello from Dagu!"
  - command: echo "Step 2"
EOF

dagu start hello.yaml

Start the Server

bash
dagu start-all

Visit http://localhost:8080

Built-in Step Types

Common built-in step types include:

Step typePurpose
command, shellLocal shell commands and scripts
docker, containerRun in a Docker container or exec into an existing container
kubernetes, k8sRun a step as a Kubernetes workload
harnessRun CLI-based coding agents and custom harness adapters
sshRemote command execution
sftpRemote file transfer
httpHTTP requests
postgres, sqliteSQL queries
redisRedis commands and scripts
s3S3 object operations
jqJSON transformation
mailEmail delivery
archiveArchive create/extract
dagSub-DAG execution
routerRoute execution to downstream steps by value
templateTemplate rendering
chatLLM chat completion
agentTool-using agent step

DAGs can also declare reusable step_types that expand to builtin step types at load time. See Custom Step Types and Step Types for the exact configuration surface.

Scheduling and Reliability

FeatureDetails
Cron schedulingTimezone support, multiple schedule entries per DAG
Overlap policiesskip (default), all (queue all), latest (keep only the most recent)
Catch-up schedulingAutomatically runs missed intervals when the scheduler was down
Zombie detectionIdentifies and handles stalled DAG runs (configurable interval, default 45s)
Retry policiesPer-step retry with configurable limits, intervals, exit code filtering, exponential/linear/constant backoff
Lifecycle hooksonInit, onSuccess, onFailure, onAbort, onExit, onWait
PreconditionsGate DAG or step execution on shell command results
Queue systemFile-based persistent queue with configurable concurrency limits per queue
Scheduler HALock with stale detection for failover across multiple scheduler instances

Security and Access Control

Authentication

Four authentication modes, configured via DAGU_AUTH_MODE:

ModeDescription
noneNo authentication
basicHTTP Basic authentication
builtinJWT-based authentication with user management, API keys, and per-DAG webhook tokens
OIDCOpenID Connect integration with any compliant identity provider

Role-Based Access Control

When using builtin auth, five roles control access:

RoleCapabilities
adminFull access including user management
managerCreate, edit, delete, run, stop DAGs; view audit logs
developerCreate, edit, delete, run, stop DAGs
operatorRun and stop DAGs only (no editing)
viewerRead-only access

API keys can be created with independent role assignments. Audit logging tracks all actions.

TLS and Secrets

  • TLS for the HTTP server (DAGU_CERT_FILE, DAGU_KEY_FILE)
  • Mutual TLS for gRPC coordinator/worker communication (DAGU_PEER_CERT_FILE, DAGU_PEER_KEY_FILE, DAGU_PEER_CLIENT_CA_FILE)
  • Secret management with three providers: environment variables, files, and HashiCorp Vault

Observability

Prometheus Metrics

Dagu exposes Prometheus-compatible metrics at the /metrics endpoint:

MetricDescription
dagu_dag_runs_totalTotal DAG runs by status
dagu_dag_runs_total_by_dagPer-DAG run counts
dagu_dag_run_duration_secondsHistogram of run durations
dagu_dag_runs_currently_runningActive DAG runs
dagu_dag_runs_queued_totalQueued runs
dagu_queue_wait_timeQueue wait time histogram
dagu_uptime_secondsServer uptime

OpenTelemetry

Per-DAG OpenTelemetry tracing configuration with OTLP endpoint, custom headers, resource attributes, and TLS options.

Structured Logging and Notifications

  • JSON or text format logging (DAGU_LOG_FORMAT), per-run log files with separate stdout/stderr capture per step
  • Slack and Telegram bot integration for run status events (succeeded, failed, aborted, waiting, rejected)
  • Email notifications on DAG success, failure, or wait status via SMTP
  • Per-DAG webhook endpoints with token authentication

Distributed Execution

The coordinator/worker architecture distributes DAG execution across multiple machines:

  • Coordinator: gRPC server managing task distribution, worker registry, and health monitoring
  • Workers: Connect to the coordinator, pull tasks via long-polling, execute DAGs locally, stream logs back
  • Worker labels: Route DAGs to specific workers based on labels (e.g., gpu=true, region=us-east-1)
  • Health checks: HTTP health endpoints on coordinator and workers for load balancer integration
  • Queue system: File-based persistent queue with configurable concurrency limits
bash
# Start coordinator
dagu coord

# Start workers (on separate machines)
DAGU_WORKER_LABELS=gpu=true,memory=64G dagu worker

See the Distributed Execution documentation for setup details.

Workflow Examples

Parallel Execution with Dependencies

yaml
type: graph
steps:
  - id: extract
    command: ./extract.sh

  - id: transform_a
    command: ./transform_a.sh
    depends: [extract]

  - id: transform_b
    command: ./transform_b.sh
    depends: [extract]

  - id: load
    command: ./load.sh
    depends: [transform_a, transform_b]

Docker Step

yaml
steps:
  - name: build
    container:
      image: node:20-alpine
    command: npm run build

Retry with Exponential Backoff

yaml
steps:
  - name: flaky-api-call
    command: curl -f https://api.example.com/data
    retry_policy:
      limit: 3
      interval_sec: 10
      backoff: 2
      max_interval_sec: 120
    continue_on:
      failure: true

Scheduling with Overlap Control

yaml
schedule:
  - "0 */6 * * *"
overlap_policy: skip
timeout_sec: 3600
handler_on:
  failure:
    command: notify-team.sh
  exit:
    command: cleanup.sh

Sub-DAG Composition

yaml
steps:
  - name: extract
    call: etl/extract
    params: "SOURCE=s3://bucket/data.csv"

  - name: transform
    call: etl/transform
    params: "INPUT=${extract.outputs.result}"
    depends: [extract]

  - name: load
    call: etl/load
    params: "DATA=${transform.outputs.result}"
    depends: [transform]

SSH Remote Execution

yaml
steps:
  - name: deploy
    type: ssh
    with:
      host: prod-server.example.com
      user: deploy
      key: ~/.ssh/id_rsa
    command: cd /var/www && git pull && systemctl restart app

See Examples for more patterns.

Version-Controlled Workflows

Dagu supports Git sync to keep DAG definitions, agent markdown files, and managed documents version-controlled. Enable DAGU_GITSYNC_ENABLED=true with a repository URL, and Dagu pulls tracked files from a Git branch. Optional auto-sync polls the repository at a configurable interval (default 300s). Supports token and SSH authentication.

See Git Sync for configuration.

CLI Reference

CommandDescription
dagu start <dag>Execute a DAG
dagu start-allStart HTTP server + scheduler
dagu serverStart HTTP server only
dagu schedulerStart scheduler only
dagu coordStart coordinator (distributed mode)
dagu workerStart worker (distributed mode)
dagu stop <dag>Stop a running DAG
dagu restart <dag>Restart a DAG
dagu retry <dag> <run-id>Retry a failed run
dagu dry <dag>Dry run (show what would execute)
dagu status <dag>Show DAG run status
dagu history <dag>Show execution history
dagu validate <dag>Validate DAG YAML
dagu enqueue <dag>Add DAG to the execution queue
dagu dequeue <dag>Remove DAG from the queue
dagu cleanupClean up old run data
dagu migrateRun database migrations

Full CLI and environment variable reference: CLI | Configuration Reference

Learn More

Overview

Architecture and core concepts

Getting Started

Installation and first workflow

Writing Workflows

YAML syntax, scheduling, execution control

YAML Reference

All configuration options

Step Types

All 18 executor types

Distributed Execution

Coordinator/worker setup

Authentication

RBAC, OIDC, API keys, audit logging

Server Administration

Deployment, configuration, operations

Community

Released under the MIT License.