fix workflow system so scheduled triggers run on latest code by shagun-singh-inkeep · Pull Request #2706 · inkeep/agents

shagun-singh-inkeep · 2026-03-16T17:02:41Z

No description provided.

changeset-bot · 2026-03-16T17:02:47Z

🦋 Changeset detected

Latest commit: 35bb793

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-core	Patch
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-03-16T17:02:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 24, 2026 9:14pm
agents-docs	Ready	Preview, Comment	Mar 24, 2026 9:14pm
agents-manage-ui	Error		Mar 24, 2026 9:14pm

pullfrog · 2026-03-16T17:05:38Z

TL;DR — Replaces the per-trigger daisy-chaining workflow architecture with a single centralized scheduler workflow that polls every 60 seconds, dispatches due triggers as independent one-shot workflows, and stores next_run_at directly on the scheduled_triggers manage table. A post-deploy CI step restarts the scheduler on the latest Vercel deployment so scheduled triggers always run current code.

Key changes

schedulerWorkflow — New long-lived singleton workflow that ticks every 60 s, queries all projects for due triggers, and dispatches one-shot execution workflows.
triggerDispatcher — New centralized dispatcher that scans next_run_at across all project branches, atomically advances the timestamp (with rollback on failure), and starts scheduledTriggerRunnerWorkflow per trigger.
next_run_at on scheduled_triggers — New column on the manage schema that replaces the old scheduled_workflows lookup; routes compute it on create/update.
scheduler_state table — New runtime singleton table tracking the current scheduler workflow run ID and deployment ID for supersession detection.
scheduledTriggerRunnerWorkflow simplification — Reduced from a daisy-chaining loop (sleep → execute → chain) to a stateless one-shot (check → create invocation → execute with retries → done).
restartScheduler deploy hook — New POST /api/deploy/restart-scheduler endpoint called by CI after Vercel deploy to start a fresh scheduler workflow on the new deployment.
ScheduledTriggerService gutted — Removed onTriggerCreated, onTriggerDeleted, startScheduledTriggerWorkflow, restartScheduledTriggerWorkflow, and signalStopScheduledTriggerWorkflow; only onTriggerUpdated (for invocation cancellation on re-enable/reschedule) survives.
Route handlers — agentFull, projectFull, and scheduledTriggers routes no longer call onTriggerCreated/onTriggerDeleted; create/update routes set nextRunAt inline via computeNextRunAt.
Data reconciliation — check function simplified from workflow-status verification to a nextRunAt-presence check on enabled triggers; onCreated/onDeleted handlers removed.

_{Summary ｜ 34 files ｜ 9 commits ｜ base: main ← feat/manage-table-cron-dispatcher-v2}

Centralized scheduler workflow and trigger dispatcher

Before: Each scheduled trigger got its own long-lived workflow that slept until the next cron tick, executed the agent, then daisy-chained to a new workflow for the next iteration. Workflow liveness was tracked in a scheduled_workflows manage table, with adoption and supersession logic.
After: A single schedulerWorkflow runs as a global singleton, ticks every 60 seconds, and delegates to dispatchDueTriggers() which queries next_run_at <= now() across all project branches and starts independent one-shot scheduledTriggerRunnerWorkflow instances.

The dispatcher atomically advances next_run_at before starting the workflow. If the workflow start fails, it rolls back the timestamp. One-time triggers (runAt without cronExpression) set enabled = false and next_run_at = null after dispatch.

How does supersession work on deploy?
The scheduler registers its workflow run ID in a `scheduler_state` singleton row. On each tick it checks whether it's still the active scheduler. After a Vercel deploy, CI calls `POST /api/deploy/restart-scheduler` which starts a new scheduler workflow and updates `scheduler_state` — the old scheduler sees the mismatch on its next tick and exits gracefully.

schedulerWorkflow.ts · triggerDispatcher.ts · SchedulerService.ts · schedulerSteps.ts

`next_run_at` column and manage-table driven scheduling

Before: Scheduling state lived in a separate scheduled_workflows table linking triggers to workflow run IDs. The workflow itself computed the next execution time internally via calculateNextExecutionStep and slept until then.
After: next_run_at is a first-class timestamptz column on scheduled_triggers. Route handlers compute it on create/update using computeNextRunAt(), and the dispatcher advances it after each dispatch.

The computeNextRunAt utility handles both cron and one-time triggers, accepting an optional lastScheduledFor to base the next tick relative to the previous scheduled time rather than wall-clock time.

Migration	DB	Change
`0013_lumpy_apocalypse.sql`	manage (Doltgres)	`ALTER TABLE scheduled_triggers ADD COLUMN next_run_at timestamptz`
`0023_broad_sharon_ventura.sql`	runtime (Postgres)	`CREATE TABLE scheduler_state` (singleton)

manage-schema.ts · computeNextRunAt.ts · scheduledTriggers.ts (DAL)

One-shot `scheduledTriggerRunnerWorkflow`

Before: The runner was a long-lived daisy-chaining workflow: check trigger → compute next time → create invocation → sleep → post-sleep re-check → execute → chain to next iteration. It handled adoption, supersession, and cancellation mid-sleep.
After: The runner is a stateless one-shot: check trigger enabled → create idempotent invocation → retry loop with cancellation checks → mark completed or failed → exit.

All scheduling concerns (when to run, what's due) are now the dispatcher's responsibility. The runner only needs a TriggerPayload with scheduledFor and focuses purely on execution.

scheduledTriggerRunner.ts · scheduledTriggerSteps.ts

Deploy restart endpoint and CI integration

Before: No mechanism to move scheduled trigger workflows to a new deployment; they continued running on the old instance until they naturally chained.
After: POST /api/deploy/restart-scheduler starts a fresh scheduler workflow, superseding the old one. The Vercel production workflow calls this after promote + deploy.

The endpoint uses constant-time comparison of INKEEP_AGENTS_RUN_API_BYPASS_SECRET for auth and is registered with noAuth() / security: [] in the OpenAPI spec.

restartScheduler.ts · vercel-production.yml

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

pullfrog

Significant architectural improvement — moving from per-trigger daisy-chaining workflows to a centralized scheduler with next_run_at on the manage table is a cleaner model. The supersession mechanism and one-shot runner design are well thought out.

There are a few issues to address before merging, roughly in priority order:

Reconciliation check is broken — listEnabledScheduledTriggers only selects { id, name }, so (t as any).nextRunAt is always undefined and every enabled trigger will be flagged as missing.
agentFull and projectFull create paths don't set nextRunAt — triggers created through these bulk routes will sit dormant until reconciliation detects them.
Crash between advance-and-dispatch loses one-time triggers — if the process dies after advanceScheduledTriggerNextRunAt commits but before start(workflow) executes, one-time triggers are permanently disabled with no execution.
as any casts — nextRunAt is omitted from ScheduledTriggerInsertSchema but passed at the call site, causing multiple as any casts. Clean fix: make nextRunAt an accepted (optional) field in the insert schema.
Security nits on the deploy endpoint — timing-safe comparison leaks secret length; error responses expose err.message.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-16T17:10:26Z


    const missingWorkflows = enabledTriggers
-      .filter((t) => !workflowsByTriggerId.has(t.id))
+      .filter((t) => !(t as any).nextRunAt)


Bug: listEnabledScheduledTriggers (in audit-queries.ts) only selects { id, name } — nextRunAt is never present on the returned objects. This means !(t as any).nextRunAt is always true, and every enabled trigger will be reported as missing.

Fix: add nextRunAt: scheduledTriggers.nextRunAt to the select() in listEnabledScheduledTriggers, then remove this as any cast.

pullfrog · 2026-03-16T17:10:26Z

+      orphanedWorkflows: [],
+      staleWorkflows: [],
+      deadWorkflows: [],
+      verificationFailures: [],


These four fields (orphanedWorkflows, staleWorkflows, deadWorkflows, verificationFailures) are now hardcoded empty arrays. Consider updating ScheduledTriggerAuditResult to remove or mark them optional — returning dead-letter fields that can never be populated adds noise.

pullfrog · 2026-03-16T17:10:26Z

+      nextRunAt,
+    } as any);


The as any cast is needed because nextRunAt is omitted from ScheduledTriggerInsertSchemaBase. Since the create route now always computes and passes nextRunAt, the insert type should accept it.

Fix: remove nextRunAt from the .omit() in ScheduledTriggerInsertSchemaBase (or add it as .optional()) so the DAL function accepts it without a cast.

pullfrog · 2026-03-16T17:10:26Z

+    const mergedEnabled = body.enabled !== undefined ? body.enabled : existing.enabled;
+    const enabledChanged = body.enabled !== undefined && body.enabled !== existing.enabled;
+
+    let nextRunAt: string | null | undefined;
+    if (!mergedEnabled) {
+      nextRunAt = null;
+    } else if (scheduleChanged || enabledChanged) {
+      const mergedCron =
+        body.cronExpression !== undefined ? body.cronExpression : existing.cronExpression;
+      const mergedTimezone =
+        body.cronTimezone !== undefined ? body.cronTimezone : existing.cronTimezone;
+      const mergedRunAt = body.runAt !== undefined ? body.runAt : existing.runAt;
+      nextRunAt = computeNextRunAt({
+        cronExpression: mergedCron,
+        cronTimezone: mergedTimezone,
+        runAt: mergedRunAt,
+      });
+    }


nextRunAt is only recomputed when scheduleChanged || enabledChanged. If only payload or messageTemplate changes, nextRunAt is left unchanged — that's correct.

However, the enabled → disabled transition sets nextRunAt = null here, but onTriggerUpdated no longer cancels pending invocations for that case (the old Case 2 was removed from ScheduledTriggerService.ts). Already-queued pending or running invocations will continue executing even though the user disabled the trigger. Consider adding cancelPendingInvocationsForTrigger to the enabled→disabled path, either here or in onTriggerUpdated.

pullfrog · 2026-03-16T17:10:26Z

+  await withRef(
+    manageDbPool,
+    resolvedRef,
+    (db) =>
+      advanceScheduledTriggerNextRunAt(db)({
+        scopes: { tenantId, projectId, agentId },
+        scheduledTriggerId,
+        nextRunAt,
+        enabled: isOneTime ? false : undefined,
+      }),
+    { commit: true, commitMessage: `Advance next_run_at for trigger ${scheduledTriggerId}` }
+  );


Risk: crash-window between advance and dispatch. If the process dies after this advanceScheduledTriggerNextRunAt commit but before start(scheduledTriggerRunnerWorkflow) on line 100, nextRunAt is already advanced but no workflow was started. The rollback on line 102 only runs if start() throws, not on a process crash.

Cron triggers: miss one execution (next tick computes a new nextRunAt) — acceptable.

One-time triggers: permanently disabled (enabled=false, nextRunAt=null) with no execution — data loss.

Consider reversing the order: start the workflow first (idempotent via createInvocationIdempotentStep), then advance nextRunAt. If advance fails, the trigger is picked up again next tick; the idempotency key prevents double-execution.

pullfrog · 2026-03-16T17:10:26Z

+  await upsertSchedulerState(runDbClient)({
+    currentRunId: run.runId,
+    deploymentId: getDeploymentId(),
+  });


Minor race: this upsertSchedulerState call is redundant with registerSchedulerStep inside the workflow. Both write currentRunId to the same singleton row. If the deploy endpoint is called twice in quick succession (e.g. CI retry), you can get:

Call 1: start() → run C

Call 2: start() → run D

Call 1: upsertSchedulerState(C)

Call 2: upsertSchedulerState(D)

Workflow C: registerSchedulerStep(C) — overwrites D

Now both C and D think they're current. Consider removing this outer upsertSchedulerState and relying solely on registerSchedulerStep inside the workflow.

pullfrog · 2026-03-16T17:10:26Z

+function constantTimeEqual(a: string, b: string): boolean {
+  const bufA = Buffer.from(a);
+  const bufB = Buffer.from(b);
+  if (bufA.length !== bufB.length) return false;
+  return timingSafeEqual(bufA, bufB);


The early return false on length mismatch exits faster than timingSafeEqual, letting an attacker binary-search the secret's byte length via response timing. Low-severity in practice (network jitter dominates), but cheap to fix:

import { createHash, timingSafeEqual } from 'node:crypto'; function constantTimeEqual(a: string, b: string): boolean { const hash = (s: string) => createHash('sha256').update(s).digest(); return timingSafeEqual(hash(a), hash(b)); }

pullfrog · 2026-03-16T17:10:27Z

+        { error: err instanceof Error ? err.message : String(err) },
+        'Failed to restart scheduler workflow'
+      );
+      return c.json({ error: err instanceof Error ? err.message : 'Internal error' }, 500);


err.message is returned to the caller, potentially leaking internal details (DB connection strings, file paths). Since the caller is CI, return a generic message and rely on the server-side log (line 53) for debugging:

return c.json({ error: 'Failed to restart scheduler workflow' }, 500);

pullfrog · 2026-03-16T17:10:27Z

+        const rows = await db.execute(
+          sql`SELECT id, tenant_id, project_id, agent_id,
+                     cron_expression, cron_timezone, run_at,
+                     next_run_at, enabled
+              FROM scheduled_triggers AS OF ${sql.raw(`'${branchName}'`)}


branchName is interpolated unescaped into the query via sql.raw. It's constructed from tenantId/projectId via getProjectScopedRef (simple concatenation, no sanitization). The values come from the runtime DB so they're trusted internal data — not a regression since this pattern exists elsewhere in the dolt module — but worth hardening. Consider a shared helper that validates branch names for AS OF clauses (e.g., reject values containing ').

pullfrog · 2026-03-16T17:10:27Z

+  }): Promise<DueScheduledTrigger[]> => {
+    const allDue: DueScheduledTrigger[] = [];
+
+    for (const project of params.projects) {


This loops over every project one-at-a-time with a separate SQL query per project. Fine for small deployments, but could become a bottleneck at scale (N round-trips to Doltgres). Consider adding a log/metric for the iteration count so you can detect when this becomes slow.

claude

PR Review Summary

(4) Total Issues | Risk: High

This is a delta review covering 3 commits since the last automated review. The delta addresses several prior issues but leaves critical blocking items unresolved.

✅ Issues Fixed in Delta

Prior Issue	Status	Evidence
Reconciliation check broken (`nextRunAt` missing in select)	✅ Fixed	`audit-queries.ts:18` now selects `nextRunAt`, `as any` cast removed
Crash between advance-and-dispatch loses one-time triggers	✅ Fixed	`triggerDispatcher.ts:84` now starts workflow before advancing
No test coverage for `computeNextRunAt`	✅ Fixed	136 lines of tests added
No test coverage for `triggerDispatcher`	✅ Fixed	240 lines of tests added
Timing-safe comparison leaks secret length	✅ Fixed	`restartScheduler.ts:11-14` now uses SHA256 hash comparison
Error response exposes `err.message`	✅ Fixed	`restartScheduler.ts:54` now returns generic error

🔴❗ Critical (1) ❗🔴

🔴 1) 0013_lumpy_apocalypse.sql:1 Missing data migration for existing enabled triggers

Issue: The migration adds a nullable next_run_at column but does NOT backfill existing enabled triggers. All currently-enabled triggers will have next_run_at = NULL after migration.

Why: The scheduler workflow at findDueScheduledTriggersAcrossProjects only dispatches triggers where next_run_at IS NOT NULL AND next_run_at <= now(). Existing enabled triggers will silently stop running after deploy. This is a one-way door causing production outages for customers relying on scheduled triggers.

Fix: Add a data migration to backfill existing triggers:

-- After the ALTER TABLE, add:
UPDATE scheduled_triggers 
SET next_run_at = NOW() 
WHERE enabled = true AND next_run_at IS NULL;

Or implement startup reconciliation that calls computeNextRunAt for any enabled trigger with NULL next_run_at.

Refs:

findDueScheduledTriggersAcrossProjects:233 — WHERE clause filters on next_run_at

🟠⚠️ Major (2) 🟠⚠️

🟠 1) agentFull.ts + projectFull.ts Bulk routes don't compute nextRunAt for new triggers

Issue: Triggers created via createFullAgentServerSide and createFullProjectServerSide (the PUT/PATCH bulk routes) call upsertScheduledTrigger without computing nextRunAt. These triggers will have nextRunAt = NULL and won't be dispatched.

Why: SDK push commands and bulk imports use these routes. Triggers will appear enabled in the UI but will never execute until manually updated via the individual trigger PATCH endpoint.

Fix: Compute nextRunAt before upserting in the DAL functions, following the pattern at scheduledTriggers.ts:374-380:

const nextRunAt = enabled
  ? computeNextRunAt({ cronExpression, cronTimezone, runAt })
  : null;

Refs:

scheduledTriggers.ts:374-380 — correct pattern for individual create

🟠 2) triggerDispatcher.ts:44 Unbounded concurrent dispatches

Issue: All due triggers are dispatched in parallel via Promise.allSettled with no concurrency limit. If many triggers become due simultaneously (scheduler outage recovery, popular cron times), this could spawn hundreds of concurrent workflow starts.

Why: Risks connection pool exhaustion, workflow engine overload, and cascading failures during recovery scenarios.

Fix: Add concurrency limiting:

import pLimit from 'p-limit';
const limit = pLimit(10);
const results = await Promise.allSettled(
  dueTriggers.map((trigger) => limit(() => dispatchSingleTrigger(trigger)))
);

Refs:

p-limit

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: triggerDispatcher.ts:104 Error log missing correlation context

💭 Consider (2) 💭

Inline Comments:

💭 Consider: computeNextRunAt.test.ts:104-126 DST tests use weak assertions
💭 Consider: triggerDispatcher.test.ts:188 Missing test for advance-failure scenario

🕐 Pending Recommendations (5)

Prior issues from pullfrog and earlier claude review that remain unresolved:

🔴 Missing data migration — triggers with NULL next_run_at (pullfrog #1)
🟠 Bulk routes don't set nextRunAt — agentFull/projectFull (pullfrog #2)
🟠 Unbounded concurrent dispatches — resource exhaustion risk (claude)
🟡 Remaining as any cast — scheduledTriggers.ts update path (pullfrog #4)
🟡 Reconciliation types weakened — optional fields hide missing implementation

🚫 REQUEST CHANGES

Summary: Good progress on the delta — 6 of 9 prior issues have been addressed, including critical fixes to the reconciliation check, dispatch ordering, test coverage, and security. However, the most critical issue remains unresolved: the data migration that will cause all existing scheduled triggers to silently stop working after deploy. This must be addressed before merge.

Secondary priority: the bulk routes (agentFull/projectFull) still don't compute nextRunAt, which will cause SDK-created triggers to be dormant.

Discarded (3)

Location	Issue	Reason Discarded
`triggerDispatcher.ts:108`	Success log emitted even when advance fails	Acceptable — log clearly says "dispatched" which is accurate; advance failure is logged separately
`computeNextRunAt.test.ts`	No test for invalid cron expressions	Low priority — invalid crons should be caught at trigger creation validation
`triggerDispatcher.test.ts`	No test for `scheduledFor` fallback	Low priority — simple fallback unlikely to regress

Reviewers (4)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Pending Recs	Discarded
`pr-review-standards`	0	0	0	0	0	0
`pr-review-tests`	5	0	1	1	0	2
`pr-review-sre`	4	1	0	1	1	1
`pr-review-breaking-changes`	3	2	0	0	2	0
Total	12	3	1	2	3	3

Note: Many findings were deduplicated with prior reviews (pullfrog, earlier claude). Delta-focused review intentionally narrow.

claude · 2026-03-16T18:25:36Z

+    logger.error(
+      { scheduledTriggerId, err },
+      'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'
+    );


🟡 Minor: Misleading "idempotent" comment

Issue: The comment claims "next tick will retry (idempotent)" but the idempotency guarantee is at the workflow layer (via idempotencyKey = sched_{triggerId}_{scheduledFor} in scheduledTriggerRunner.ts), not the dispatcher layer. If advance fails, the next tick will start another workflow instance that must detect the duplicate via createInvocationIdempotentStep.

Why: During incident triage, this comment could mislead engineers into thinking the dispatcher itself prevents duplicates, when it actually relies on downstream workflow-level deduplication.

Fix: Clarify the comment:

Suggested change

);

logger.error(

{ scheduledTriggerId, err },

'Failed to advance next_run_at after workflow start; next tick will re-dispatch (workflow has invocation-level idempotency)'

);

Refs:

scheduledTriggerRunner.ts:19-21 — idempotency key generation

claude · 2026-03-16T18:25:40Z

+  } catch (err) {
+    logger.error(
+      { scheduledTriggerId, err },
+      'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'


🟡 Minor: Error log missing context for incident correlation

Issue: The error log includes scheduledTriggerId and err but lacks tenantId, projectId, agentId, and scheduledFor timestamp. During an incident with multiple failing triggers, correlating these logs to specific customers will be difficult.

Why: All these identifiers are already in scope — including them costs nothing and significantly improves debuggability.

Fix:

Suggested change

'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'

logger.error(

{ scheduledTriggerId, tenantId, projectId, agentId, scheduledFor: trigger.nextRunAt, err },

Refs:

api-logging-guidelines skill

claude · 2026-03-16T18:33:07Z

+  } catch (err) {
+    logger.error(
+      { scheduledTriggerId, err },
+      'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'


🟡 Minor: Error log missing correlation context for incident debugging

Issue: This log includes scheduledTriggerId and err but is missing tenantId, projectId, agentId, and scheduledFor — all of which are available in scope on line 64.

Why: During an incident with multiple failing triggers, operators would need to manually correlate trigger IDs back to tenant/project context. Including all identifiers makes debugging significantly easier and costs nothing.

Fix:

Suggested change

'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'

logger.error(

{ scheduledTriggerId, tenantId, projectId, agentId, scheduledFor: trigger.nextRunAt, err },

'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'

);

Refs:

api-logging-guidelines skill

claude · 2026-03-16T18:33:18Z

+      return Promise.resolve();
+    });
+    mockWithRef.mockImplementation(async (_pool, _ref, fn, _opts) => {
+      return fn('mock-branch-db');


🟡 Minor: Missing test for advance-failure-after-workflow-start scenario

Issue: This test covers workflow start failure, but there's no test for when advanceScheduledTriggerNextRunAt fails after workflow start succeeds. In triggerDispatcher.ts:88-106, if withRef throws, the code logs an error but still returns 'dispatched'.

Why: Without this test, it's unclear whether the current behavior (count as dispatched, log error) is intentional or accidental. The next scheduler tick will dispatch the same trigger again, relying on workflow-level idempotency. This test would document the expected behavior and catch regressions.

Fix: Add a test case:

it('counts as dispatched even when advance fails (workflow already started)', async () => { mockStart.mockResolvedValue(undefined); mockWithRef.mockRejectedValue(new Error('db write failed')); const result = await dispatchDueTriggers(); expect(result).toEqual({ dispatched: 1 }); expect(mockStart).toHaveBeenCalledTimes(1); });

Refs:

triggerDispatcher.ts:101-106 — error handling path

claude · 2026-03-16T18:33:30Z

+  it('handles DST spring-forward transition', () => {
+    const result = computeNextRunAt({
+      cronExpression: '30 2 * * *',
+      cronTimezone: 'America/New_York',
+      lastScheduledFor: '2026-03-07T07:30:00.000Z',
+    });
+
+    expect(result).toBeDefined();
+    const nextDate = new Date(result!);
+    expect(nextDate.getTime()).toBeGreaterThan(new Date('2026-03-07T07:30:00.000Z').getTime());
+  });
+
+  it('handles DST fall-back transition', () => {
+    const result = computeNextRunAt({
+      cronExpression: '30 1 * * *',
+      cronTimezone: 'America/New_York',
+      lastScheduledFor: '2026-10-31T05:30:00.000Z',
+    });
+
+    expect(result).toBeDefined();
+    const nextDate = new Date(result!);
+    expect(nextDate.getTime()).toBeGreaterThan(new Date('2026-10-31T05:30:00.000Z').getTime());
+  });


💭 Consider: DST tests use weak assertions

Issue: These DST tests only assert that the result is defined and later than the input. They don't verify the exact expected next run time, meaning bugs in DST handling (e.g., skipping to the wrong day, firing twice during fall-back) would pass.

Why: DST bugs in scheduler systems are notoriously hard to debug in production. Spring-forward and fall-back transitions can cause triggers to fire at unexpected times or not at all.

Fix: Strengthen assertions to verify exact expected times. For example, for the spring-forward test (March 8, 2026 in America/New_York):

// 2:30 AM doesn't exist during spring-forward, so next valid occurrence // after 2026-03-07T07:30:00Z should be 2026-03-09T07:30:00Z expect(result).toBe('2026-03-09T07:30:00.000Z');

Refs:

cron-parser DST handling

pullfrog · 2026-03-24T21:14:56Z

TL;DR — Replaces the per-trigger daisy-chaining workflow model with a single centralized scheduler workflow that polls the runtime DB every 60 seconds and dispatches one-shot workflows for due triggers. This moves the scheduled_triggers table from the manage DB (DoltgreSQL) to the runtime DB (Postgres) so triggers always run against the latest code, and adds a next_run_at column and a ref column so triggers can target specific branches.

Key changes

Centralized scheduler workflow — A single long-lived schedulerWorkflow ticks every 60 seconds, queries all due triggers across projects, and dispatches independent one-shot runner workflows for each
Scheduled triggers moved to runtime DB — The scheduled_triggers table is dropped from manage-schema (DoltgreSQL) and recreated in runtime-schema (Postgres) with new next_run_at and ref columns
One-shot trigger runner — scheduledTriggerRunnerWorkflow is simplified from a daisy-chaining loop to a single-invocation executor that receives scheduledFor and ref from the dispatcher
Deploy restart endpoint — New POST /api/deploy/restart-scheduler route lets CI restart the scheduler on the latest deployment, with a matching Vercel production workflow step
Branch-aware triggers — Triggers store a ref that controls which branch's agent config is used at execution time; branch deletion cascades to clean up associated triggers
Removed per-trigger workflow management — Eliminates scheduled_workflows table, ScheduledWorkflow entity, startScheduledTriggerWorkflow/signalStopScheduledTriggerWorkflow/restartScheduledTriggerWorkflow, and the reconciliation handler for scheduled triggers

_{Summary ｜ 60 files ｜ 23 commits ｜ base: main ← feat/manage-table-cron-dispatcher-v2}

Scheduler architecture overhaul

Before: Each scheduled trigger had its own long-running workflow that daisy-chained iterations, tracked state via a scheduled_workflows table in DoltgreSQL, and required complex adoption/supersession logic when workflows crashed or deployments changed.
After: A single schedulerWorkflow runs as a long-lived loop, ticking every 60 seconds. It queries scheduled_triggers.next_run_at in the runtime DB, dispatches independent one-shot scheduledTriggerRunnerWorkflow instances for due triggers, and advances next_run_at after dispatch.

The scheduler registers itself in a new scheduler_state singleton table. On each tick it checks whether it is still the active scheduler — if a newer instance has started (e.g., after a deploy), the old one gracefully exits. For Vercel deployments, a post-deploy CI step calls the restart endpoint to ensure the scheduler runs on the latest code. For postgres-world/local, the scheduler starts on boot after orphan recovery.

schedulerWorkflow.ts · schedulerSteps.ts · SchedulerService.ts · schedulerState.ts

Trigger dispatcher and one-shot runner

Before: Each trigger runner workflow computed its own next execution time, slept until then, executed, and daisy-chained to a new workflow — requiring complex state management for adoption, supersession, and cancelled-invocation recovery.
After: dispatchDueTriggers() queries all due triggers in a single cross-project query, starts a one-shot workflow per trigger with Promise.allSettled, and advances next_run_at (or disables for one-time triggers). The runner is a simple linear flow: check enabled → create invocation → execute with retries → mark completed or failed.

The dispatcher computes next_run_at via the new shared computeNextRunAt utility, which is also used by the CRUD routes when creating/updating triggers.

triggerDispatcher.ts · scheduledTriggerRunner.ts · compute-next-run-at.ts

Schema migration: manage DB → runtime DB

Before: scheduled_triggers and scheduled_workflows lived in the manage DB (DoltgreSQL), requiring branch-scoped withRef queries and AS OF semantics for every read/write.
After: scheduled_triggers moves to the runtime DB (Postgres) with new columns next_run_at and ref. scheduled_workflows is dropped entirely. All CRUD routes now use runDbClient directly instead of c.get('db') with branch scoping.

The manage DB migration (0013) drops the old tables. The runtime DB migration (0025) creates the new scheduled_triggers table with indexes on (enabled, next_run_at) for the dispatcher query, (tenant_id, project_id, agent_id) for scoped lookups, and (ref) for branch deletion cleanup. A new scheduler_state singleton table is also created.

manage-schema.ts · runtime-schema.ts · 0025_long_shard.sql · 0013_married_rage.sql

Branch-aware trigger execution

Before: Triggers always resolved against the main branch via getProjectScopedRef(tenantId, projectId, 'main').
After: Triggers store a ref column. At execution time, executeScheduledTriggerStep resolves the trigger's ref (defaulting to main if null), so triggers can run against specific branch configurations.

When a branch is deleted, deleteScheduledTriggersByRef cleans up all triggers targeting that branch in the runtime DB. The dispatcher passes ref through to each one-shot workflow payload.

scheduledTriggerSteps.ts · branches.ts · runtime/scheduledTriggers.ts

Deploy restart endpoint and CI integration

Before: No mechanism to ensure the scheduler moves to a new deployment after a Vercel deploy.
After: A new POST /api/deploy/restart-scheduler endpoint (authed via INKEEP_AGENTS_RUN_API_BYPASS_SECRET) starts a fresh scheduler workflow. The Vercel production workflow adds a restart-scheduler job that calls this endpoint after deploy + promote.

The endpoint uses constant-time comparison for the bearer token and the noAuth() permission (public route with manual auth). On postgres-world/local, the scheduler starts automatically on server boot.

restartScheduler.ts · createApp.ts · index.ts · vercel-production.yml

CRUD routes and service simplification

Before: ScheduledTriggerService exposed startScheduledTriggerWorkflow, signalStopScheduledTriggerWorkflow, restartScheduledTriggerWorkflow, onTriggerCreated, onTriggerDeleted, and onTriggerUpdated. Routes called these lifecycle hooks on every mutation and required branch-scoped manage DB connections.
After: The service is reduced to onTriggerUpdated (for cancelling stale pending invocations on re-enable or schedule change). CRUD routes write directly to runDbClient, computing nextRunAt inline at create/update time. No workflow lifecycle management on mutation.

ScheduledTriggerService.ts · scheduledTriggers.ts (routes) · manage/scheduledTriggers.ts (deleted)

Removed reconciliation and audit infrastructure

Before: A scheduled_triggers reconciliation handler checked for missing workflows, orphaned workflows, stale workflows, and dead workflows — repairing them by starting or stopping per-trigger workflows.
After: The entire scheduledTriggersHandlers reconciliation handler is deleted. The centralized scheduler inherently recovers from any state — if a trigger is due and enabled, it will be dispatched on the next tick.

scheduled-triggers.ts (deleted) · registry.ts

UI and cleanup updates

The scheduled triggers table component is rewritten from a DataTable/ColumnDef pattern to a direct Table component, adds a branch (ref) column with a GitBranch icon, and simplifies state management. Trigger cleanup for user deletion now deletes from the runtime DB before the manage DB branch-scoped cleanup.

project-scheduled-triggers-table.tsx · triggerCleanup.ts

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-24T21:16:15Z

TL;DR — Replaces the per-trigger daisy-chaining workflow model with a single centralized scheduler workflow that polls the runtime DB every 60 seconds and dispatches one-shot workflows for due triggers. This moves the scheduled_triggers table from the manage DB (DoltgreSQL) to the runtime DB (Postgres) so triggers always run against the latest code, and adds a next_run_at column and a ref column so triggers can target specific branches.

Key changes

Centralized scheduler workflow — A single long-lived schedulerWorkflow ticks every 60 seconds, queries all due triggers across projects, and dispatches independent one-shot runner workflows for each
Scheduled triggers moved to runtime DB — The scheduled_triggers table is dropped from manage-schema (DoltgreSQL) and recreated in runtime-schema (Postgres) with new next_run_at and ref columns
One-shot trigger runner — scheduledTriggerRunnerWorkflow is simplified from a daisy-chaining loop to a single-invocation executor that receives scheduledFor and ref from the dispatcher
Deploy restart endpoint — New POST /api/deploy/restart-scheduler route lets CI restart the scheduler on the latest deployment, with a matching Vercel production workflow step
Branch-aware triggers — Triggers store a ref that controls which branch's agent config is used at execution time; branch deletion cascades to clean up associated triggers
Removed per-trigger workflow management — Eliminates scheduled_workflows table, ScheduledWorkflow entity, startScheduledTriggerWorkflow/signalStopScheduledTriggerWorkflow/restartScheduledTriggerWorkflow, and the reconciliation handler for scheduled triggers

_{Summary ｜ 60 files ｜ 23 commits ｜ base: main ← feat/manage-table-cron-dispatcher-v2}

Scheduler architecture overhaul

Before: Each scheduled trigger had its own long-running workflow that daisy-chained iterations, tracked state via a scheduled_workflows table in DoltgreSQL, and required complex adoption/supersession logic when workflows crashed or deployments changed.
After: A single schedulerWorkflow runs as a long-lived loop, ticking every 60 seconds. It queries scheduled_triggers.next_run_at in the runtime DB, dispatches independent one-shot scheduledTriggerRunnerWorkflow instances for due triggers, and advances next_run_at after dispatch.

The scheduler registers itself in a new scheduler_state singleton table. On each tick it checks whether it is still the active scheduler — if a newer instance has started (e.g., after a deploy), the old one gracefully exits. For Vercel deployments, a post-deploy CI step calls the restart endpoint to ensure the scheduler runs on the latest code. For postgres-world/local, the scheduler starts on boot after orphan recovery.

schedulerWorkflow.ts · schedulerSteps.ts · SchedulerService.ts · schedulerState.ts

Trigger dispatcher and one-shot runner

Before: Each trigger runner workflow computed its own next execution time, slept until then, executed, and daisy-chained to a new workflow — requiring complex state management for adoption, supersession, and cancelled-invocation recovery.
After: dispatchDueTriggers() queries all due triggers in a single cross-project query, starts a one-shot workflow per trigger with Promise.allSettled, and advances next_run_at (or disables for one-time triggers). The runner is a simple linear flow: check enabled → create invocation → execute with retries → mark completed or failed.

The dispatcher computes next_run_at via the new shared computeNextRunAt utility, which is also used by the CRUD routes when creating/updating triggers.

triggerDispatcher.ts · scheduledTriggerRunner.ts · compute-next-run-at.ts

Schema migration: manage DB → runtime DB

Before: scheduled_triggers and scheduled_workflows lived in the manage DB (DoltgreSQL), requiring branch-scoped withRef queries and AS OF semantics for every read/write.
After: scheduled_triggers moves to the runtime DB (Postgres) with new columns next_run_at and ref. scheduled_workflows is dropped entirely. All CRUD routes now use runDbClient directly instead of c.get('db') with branch scoping.

The manage DB migration (0013) drops the old tables. The runtime DB migration (0025) creates the new scheduled_triggers table with indexes on (enabled, next_run_at) for the dispatcher query, (tenant_id, project_id, agent_id) for scoped lookups, and (ref) for branch deletion cleanup. A new scheduler_state singleton table is also created.

manage-schema.ts · runtime-schema.ts · 0025_long_shard.sql · 0013_married_rage.sql

Branch-aware trigger execution

Before: Triggers always resolved against the main branch via getProjectScopedRef(tenantId, projectId, 'main').
After: Triggers store a ref column. At execution time, executeScheduledTriggerStep resolves the trigger's ref (defaulting to main if null), so triggers can run against specific branch configurations.

When a branch is deleted, deleteScheduledTriggersByRef cleans up all triggers targeting that branch in the runtime DB. The dispatcher passes ref through to each one-shot workflow payload.

scheduledTriggerSteps.ts · branches.ts · runtime/scheduledTriggers.ts

Deploy restart endpoint and CI integration

Before: No mechanism to ensure the scheduler moves to a new deployment after a Vercel deploy.
After: A new POST /api/deploy/restart-scheduler endpoint (authed via INKEEP_AGENTS_RUN_API_BYPASS_SECRET) starts a fresh scheduler workflow. The Vercel production workflow adds a restart-scheduler job that calls this endpoint after deploy + promote.

The endpoint uses constant-time comparison for the bearer token and the noAuth() permission (public route with manual auth). On postgres-world/local, the scheduler starts automatically on server boot.

restartScheduler.ts · createApp.ts · index.ts · vercel-production.yml

CRUD routes and service simplification

Before: ScheduledTriggerService exposed startScheduledTriggerWorkflow, signalStopScheduledTriggerWorkflow, restartScheduledTriggerWorkflow, onTriggerCreated, onTriggerDeleted, and onTriggerUpdated. Routes called these lifecycle hooks on every mutation and required branch-scoped manage DB connections.
After: The service is reduced to onTriggerUpdated (for cancelling stale pending invocations on re-enable or schedule change). CRUD routes write directly to runDbClient, computing nextRunAt inline at create/update time. No workflow lifecycle management on mutation.

ScheduledTriggerService.ts · scheduledTriggers.ts (routes) · manage/scheduledTriggers.ts (deleted)

Removed reconciliation and audit infrastructure

Before: A scheduled_triggers reconciliation handler checked for missing workflows, orphaned workflows, stale workflows, and dead workflows — repairing them by starting or stopping per-trigger workflows.
After: The entire scheduledTriggersHandlers reconciliation handler is deleted. The centralized scheduler inherently recovers from any state — if a trigger is due and enabled, it will be dispatched on the next tick.

scheduled-triggers.ts (deleted) · registry.ts

UI and cleanup updates

The scheduled triggers table component is rewritten from a DataTable/ColumnDef pattern to a direct Table component, adds a branch (ref) column with a GitBranch icon, and simplifies state management. Trigger cleanup for user deletion now deletes from the runtime DB before the manage DB branch-scoped cleanup.

project-scheduled-triggers-table.tsx · triggerCleanup.ts

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

github-actions · 2026-04-01T00:45:31Z

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed in 7 days if no further activity occurs.

If this PR is still relevant:

Rebase it on the latest main branch
Add a comment explaining its current status
Request a review if it's ready

Thank you for your contributions!

github-actions · 2026-04-09T00:35:34Z

This pull request has been automatically closed due to inactivity.

If you'd like to continue working on this, please:

Create a new branch from the latest main
Cherry-pick your commits or rebase your changes
Open a new pull request

Thank you for your understanding!

vercel Bot deployed to Preview – agents-api March 16, 2026 17:02 View deployment

vercel Bot had a problem deploying to Preview – agents-docs March 16, 2026 17:05 Failure

vercel Bot had a problem deploying to Preview – agents-api March 16, 2026 17:05 Failure

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 17:06 View deployment

pullfrog Bot reviewed Mar 16, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview – agents-docs March 16, 2026 17:15 Inactive

github-actions Bot deleted a comment from claude Bot Mar 16, 2026

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 17:17 View deployment

vercel Bot deployed to Preview – agents-api March 16, 2026 17:17 View deployment

This comment was marked as outdated.

Sign in to view

github-actions Bot deleted a comment from claude Bot Mar 16, 2026

inkeep deleted a comment from claude Bot Mar 16, 2026

vercel Bot had a problem deploying to Preview – agents-docs March 16, 2026 18:22 Failure

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 18:23 View deployment

vercel Bot deployed to Preview – agents-api March 16, 2026 18:23 View deployment

vercel Bot temporarily deployed to Preview – agents-docs March 16, 2026 18:25 Inactive

github-actions Bot deleted a comment from claude Bot Mar 16, 2026

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 18:27 View deployment

vercel Bot temporarily deployed to Preview – agents-docs March 16, 2026 18:27 Inactive

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 18:29 View deployment

vercel Bot deployed to Preview – agents-api March 16, 2026 18:29 View deployment

claude Bot reviewed Mar 16, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Mar 16, 2026

vercel Bot deployed to Preview – agents-manage-ui March 16, 2026 18:47 View deployment

vercel Bot deployed to Preview – agents-api March 16, 2026 18:47 View deployment

vercel Bot deployed to Preview – agents-docs March 16, 2026 18:47 View deployment

vercel Bot temporarily deployed to Preview – agents-docs March 16, 2026 18:49 Inactive

shagun-singh-inkeep and others added 16 commits March 24, 2026 10:02

implementation

43bc201

remove spec

bee72cd

Remove spec

a0e8b21

fix

fd17620

claude comments

602e550

fix list of tag

200391c

chore: update OpenAPI snapshot

8866217

fix

fda9e2b

fix

5ec0c0f

lint

27eadea

typecheck

1a2fa1b

drop deployment id

c1c5e04

unneeded workflow endpoint

c4f7ef6

fixes

6891537

wip

aaf397c

fixes

d02e0fd

shagun-singh-inkeep force-pushed the feat/manage-table-cron-dispatcher-v2 branch from feb29ce to d02e0fd Compare March 24, 2026 21:08

vercel Bot had a problem deploying to Preview – agents-manage-ui March 24, 2026 21:10 Failure

fixes

35bb793

vercel Bot had a problem deploying to Preview – agents-manage-ui March 24, 2026 21:13 Failure

vercel Bot deployed to Preview – agents-api March 24, 2026 21:14 View deployment

vercel Bot deployed to Preview – agents-docs March 24, 2026 21:14 View deployment

github-actions Bot removed the stale label Mar 25, 2026

github-actions Bot added the stale label Apr 1, 2026

github-actions Bot closed this Apr 9, 2026

github-actions Bot deleted the feat/manage-table-cron-dispatcher-v2 branch April 9, 2026 00:35

	'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)'
	logger.error(
	{ scheduledTriggerId, tenantId, projectId, agentId, scheduledFor: trigger.nextRunAt, err },

Conversation

shagun-singh-inkeep commented Mar 16, 2026

Uh oh!

changeset-bot Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pullfrog Bot commented Mar 16, 2026

Key changes

Centralized scheduler workflow and trigger dispatcher

next_run_at column and manage-table driven scheduling

One-shot scheduledTriggerRunnerWorkflow

Deploy restart endpoint and CI integration

Uh oh!

pullfrog Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

✅ Issues Fixed in Delta

🔴❗ Critical (1) ❗🔴

🟠⚠️ Major (2) 🟠⚠️

🟡 Minor (1) 🟡

💭 Consider (2) 💭

🕐 Pending Recommendations (5)

🚫 REQUEST CHANGES

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot commented Mar 24, 2026

changeset-bot Bot commented Mar 16, 2026 •

edited

Loading

vercel Bot commented Mar 16, 2026 •

edited

Loading

`next_run_at` column and manage-table driven scheduling

One-shot `scheduledTriggerRunnerWorkflow`

pullfrog Bot left a comment •

edited

Loading