Skip to content

Update triggers to use daisy chaining#2612

Merged
robert-inkeep merged 4 commits intomainfrom
update-triggers
Mar 10, 2026
Merged

Update triggers to use daisy chaining#2612
robert-inkeep merged 4 commits intomainfrom
update-triggers

Conversation

@shagun-singh-inkeep
Copy link
Copy Markdown
Collaborator

Scheduled triggers using cron expressions experience accumulated start delay that grows linearly over time. A trigger scheduled every 30 minutes shows ~11s delay initially, degrading to ~2+ minutes after ~17 hours of operation.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 10, 2026

🦋 Changeset detected

Latest commit: 7385402

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 10, 2026 9:20pm
agents-docs Ready Ready Preview, Comment Mar 10, 2026 9:20pm
agents-manage-ui Ready Ready Preview, Comment Mar 10, 2026 9:20pm

Request Review

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The daisy-chaining approach is a sound fix for the timing drift problem — replacing a long-lived loop with per-iteration workflow runs eliminates accumulated scheduling error. The adoption logic for crash recovery between parent/child is well thought through. One bug to fix around cancelled-vs-failed status, and a minor robustness note.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

Comment on lines +370 to +378
if (lastError) {
await markFailedStep({
tenantId,
projectId,
agentId,
scheduledTriggerId,
invocationId: invocation.id,
});
}

This comment was marked as outdated.

Comment on lines +88 to +103
if (resolvedRef) {
await withRef(manageDbPool, resolvedRef, async (db) => {
const workflow = await getScheduledWorkflowByTriggerId(db)({
scopes,
scheduledTriggerId: params.scheduledTriggerId,
});
if (workflow) {
await updateScheduledWorkflowRunId(db)({
scopes,
scheduledWorkflowId: workflow.id,
workflowRunId: run.runId,
status: 'running',
});
}
});
}

This comment was marked as outdated.

"@inkeep/agents-api": patch
---

daisy chain trigger
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per repo conventions, changeset messages should start with a verb in sentence case and describe the user-facing impact. Suggestion: "Fix scheduled trigger timing drift by replacing long-running loop with per-iteration workflow chaining"

Suggested change
daisy chain trigger
Fix scheduled trigger timing drift by replacing long-running loop with per-iteration workflow chaining

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Mar 10, 2026

Replaces the long-lived while (true) loop in the scheduled trigger runner with a daisy-chaining model, where each cron iteration spawns a fresh workflow run. This fixes accumulated start delay that grows linearly over time for recurring triggers.

  • agents-api/src/domains/run/workflow/functions/scheduledTriggerRunner.ts — Flattens the main workflow from a loop into a single-iteration function. Adds startNextIterationStep which calls start() to launch the next workflow run, updates the scheduledWorkflow record with the new runId, and logs the parent→child chain. Payload now carries lastScheduledFor and parentRunId for cross-iteration state.
  • agents-api/src/domains/run/workflow/steps/scheduledTriggerSteps.ts — Extends checkTriggerEnabledStep with an adoption mechanism: if the DB still holds the parent's workflowRunId (parent crashed after start() but before updating the DB), the child adopts itself via updateScheduledWorkflowRunId instead of stopping as "superseded."
  • .changeset/young-pans-see.md — Patch changeset for @inkeep/agents-api.

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(7) Total Issues | Risk: Medium

🟠⚠️ Major (4) 🟠⚠️

🟠 1) scheduledTriggerRunner.ts:78-103 TOCTOU race between start() and DB update

Issue: The startNextIterationStep function performs a non-atomic start-then-update pattern: it calls start() to spawn the child workflow, then separately updates the DB with the child's runId. There is a window where the parent can crash after spawning the child but before persisting the child's ID.

Why: If the parent crashes in this window, the DB retains the parent's runId. The adoption logic handles this specific case, but the overall pattern introduces subtle crash-recovery semantics that depend on the workflow system's delivery guarantees. Additionally, if start() itself partially fails (workflow created but run.runId not returned), a ghost workflow could run without being tracked.

Fix: The pattern is acceptable given the adoption mechanism, but consider:

  1. Adding a compare-and-swap pattern in updateScheduledWorkflowRunId (only update if current workflowRunId equals parentRunId) for stricter guarantees
  2. Documenting the crash-recovery invariants and failure windows in a code comment

Refs:


🟠 2) scheduledTriggerRunner.ts:400-407 & 258-266 Missing error handling for startNextIterationStep

Issue: At both the end-of-workflow chaining path (line 400) and the cancelled-invocation chaining path (line 258), startNextIterationStep() is called without error handling. If this call fails (workflow service unavailable, DB error), the workflow terminates or returns misleading status.

Why:

  • At line 400: If chaining fails, the function never reaches the return { status: 'chained' }. The trigger chain silently breaks with no indication in logs.
  • At line 258: Returns { status: 'cancelled' } even if chaining failed, masking the chain break.

Fix: Wrap both calls in try-catch:

try {
  await startNextIterationStep({...});
} catch (err) {
  await logStep('Failed to chain to next iteration', {
    scheduledTriggerId,
    error: err instanceof Error ? err.message : String(err),
  });
  throw err; // Let workflow framework handle retry
}

Refs:


🟠 3) scope: system Missing trace context propagation across daisy-chained runs

Issue: When start(scheduledTriggerRunnerWorkflow, [newPayload]) spawns a new workflow run, there is no explicit trace context propagation. Each chained run will start a new trace, fragmenting the observability for a trigger's execution history.

Why: For a cron trigger running every 30 minutes, after a day you'd have ~48 disconnected traces. Debugging issues across long-running triggers becomes difficult since traces are per-iteration rather than connected. The current logging includes parentRunId and childRunId but these are not added to trace context or span attributes.

Fix: Propagate a root trace ID or correlation ID through the payload and include it as a span attribute:

// In ScheduledTriggerRunnerPayload:
rootTraceId?: string | null;

// In startNextIterationStep:
const newPayload: ScheduledTriggerRunnerPayload = {
  // ...existing fields
  rootTraceId: params.rootTraceId ?? metadata.traceId, // Carry forward or use current
};

Consider adding span links to connect parent→child traces for distributed tracing visualization.

Refs:


🟠 4) scope: system Critical test coverage gaps for new functionality

Issue: The PR introduces significant new behavior with zero test coverage:

  • startNextIterationStep function (lines 59-115) — the core chaining mechanism
  • Adoption logic in checkTriggerEnabledStep (lines 204-232) — crash recovery
  • Cancelled invocation chaining (lines 257-266)
  • Pre-chain enabled check (lines 384-398)

Why: These code paths handle critical reliability scenarios (crash recovery, chain continuation). Bugs here cause scheduled triggers to silently stop working with no error visibility. The adoption logic is especially subtle — incorrect behavior would cause child workflows spawned by crashed parents to be marked "superseded" and stop immediately.

Fix: Add tests for:

  1. startNextIterationStep — verify payload propagation, DB update, behavior when resolvedRef is null
  2. Adoption path — test when workflow.workflowRunId === parentRunId, verify DB update occurs
  3. Superseded detection — test when workflowRunId differs and no parentRunId matches
  4. Cancelled cron invocation chains to next iteration; one-time does not

Refs:

Inline Comments:

  • 🟠 Major: .changeset/young-pans-see.md:5 Changeset message doesn't follow AGENTS.md style
  • 🟠 Major: scheduledTriggerRunner.ts:88 Silent failure when resolvedRef is null
  • 🟠 Major: scheduledTriggerRunner.ts:94 Silent failure when workflow record not found
  • 🟠 Major: scheduledTriggerSteps.ts:210-217 Adoption DB update lacks error handling

💭 Consider (3) 💭

💭 1) scheduledTriggerRunner.ts:384-398 Pre-chain enabled check may be redundant
Issue: The preChainCheck verifies the trigger is still enabled before chaining, but the next iteration immediately calls checkTriggerEnabledStep as its first action. Two sequential enabled checks occur.
Why: The extra DB query adds latency to every successful cron iteration. If the cost of starting a workflow that immediately stops is negligible, the pre-chain check could be removed.
Fix: Evaluate whether the workflow start cost justifies the extra check. If not, remove the pre-chain check.

💭 2) scheduledTriggerSteps.ts:209-225 Adoption logic lacks observability metrics
Issue: The adoption pattern logs when it occurs but doesn't emit a counter metric.
Why: Frequent adoptions could indicate workflow framework instability. Without metrics, operators cannot monitor the health of the daisy-chaining mechanism or alert on elevated adoption rates.
Fix: Emit a counter metric (e.g., scheduled_trigger.workflow_adoption_count) when adoption occurs.

Inline Comments:

  • 💭 Consider: scheduledTriggerRunner.ts:1-11 Restore explanatory comment about why daisy-chaining is used

💡 APPROVE WITH SUGGESTIONS

Summary: The architectural approach (daisy-chaining to fix accumulated replay delay) is sound and well-reasoned. However, several error handling gaps could cause the trigger chain to silently break in production failure scenarios. The adoption logic for crash recovery is thoughtful but needs error handling and test coverage. Consider addressing the silent failure paths in startNextIterationStep and adding tests for the new crash recovery code paths.

Discarded (5)
Location Issue Reason Discarded
scheduledTriggerRunner.ts:363-364 Retry jitter uses fixed 30% range without exponential backoff Pre-existing pattern, not introduced by this PR
system Daisy-chain pattern distinct from existing workflows Informational only, appropriate divergence for use case
scheduledTriggerSteps.ts:206 Orphan recovery interaction edge case Too speculative, low confidence without production data
scheduledTriggerRunner.ts:41 chainLogger missing scheduledTriggerId context Very minor, current per-call logging is adequate
scheduledTriggerRunner.ts:43-50 Payload field validation edge cases Low severity, existing cron-parser handles invalid timestamps
Reviewers (6)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-tests 6 1 0 0 0 0 1
pr-review-sre 7 2 1 0 0 0 2
pr-review-errors 5 1 0 0 3 0 0
pr-review-architecture 3 1 0 0 0 0 2
pr-review-precision 3 0 1 0 1 0 0
pr-review-standards 1 0 0 0 1 0 0
Total 25 5 2 0 5 0 5

Note: Several findings were merged across reviewers (e.g., changeset quality flagged by both standards and precision reviewers, race condition flagged by both architecture and SRE reviewers).

"@inkeep/agents-api": patch
---

daisy chain trigger
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Changeset message doesn't follow AGENTS.md style requirements

Issue: The message "daisy chain trigger" lacks sentence case, doesn't start with an action verb, and doesn't explain the user impact. Per AGENTS.md, changelog messages should be specific about what changed and why it matters to consumers.

Why: Users reading the changelog won't understand this fixes a production issue where cron triggers experienced growing execution delays (~11s initially → 2+ minutes after ~17 hours).

Fix:

Suggested change
daisy chain trigger
---
"@inkeep/agents-api": patch
---
Fix accumulated latency in long-running scheduled triggers by using daisy-chain workflow pattern

Refs:

const ref = getProjectScopedRef(params.tenantId, params.projectId, 'main');
const resolvedRef = await resolveRef(manageDbClient)(ref);

if (resolvedRef) {

This comment was marked as outdated.

scopes,
scheduledTriggerId: params.scheduledTriggerId,
});
if (workflow) {

This comment was marked as outdated.

Comment on lines +210 to +217
await withRef(manageDbPool, resolvedRef, async (db) => {
await updateScheduledWorkflowRunId(db)({
scopes,
scheduledWorkflowId: workflow.id,
workflowRunId: params.runnerId,
status: 'running',
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Adoption DB update lacks error handling

Issue: The updateScheduledWorkflowRunId call can fail (network error, DB constraint violation, connection pool exhaustion), but exceptions propagate unhandled.

Why: If adoption fails, the child workflow stops with 'superseded' on retry because the DB still shows the parent's runId. The cron trigger stops executing entirely until manually restarted. This defeats the purpose of the crash recovery logic.

Fix: Wrap in try-catch with appropriate handling:

if (params.parentRunId && workflow.workflowRunId === params.parentRunId) {
  try {
    await withRef(manageDbPool, resolvedRef, async (db) => {
      await updateScheduledWorkflowRunId(db)({
        scopes,
        scheduledWorkflowId: workflow.id,
        workflowRunId: params.runnerId,
        status: 'running',
      });
    });
    logger.info(
      { scheduledTriggerId: params.scheduledTriggerId, parentRunId: params.parentRunId, newRunnerId: params.runnerId },
      'Child workflow adopted workflowRunId from parent'
    );
  } catch (err) {
    logger.error(
      { scheduledTriggerId: params.scheduledTriggerId, parentRunId: params.parentRunId, error: err },
      'Failed to adopt workflowRunId - workflow may be superseded incorrectly'
    );
    throw err; // Let workflow framework retry
  }
}

Refs:

Comment on lines 1 to 11
/**
* Workflow for running scheduled triggers.
* Workflow for running scheduled triggers using daisy-chaining.
*
* This workflow:
* 1. Gets or creates the next pending invocation
* 2. Sleeps until its scheduled time
* 3. Checks if the trigger is still enabled
* Each run executes a single iteration:
* 1. Checks if the trigger is still enabled (with adoption for chained runs)
* 2. Gets or creates the next pending invocation
* 3. Sleeps until its scheduled time
* 4. Executes the agent with retries
* 5. For cron triggers, loops back to step 1
* 5. For cron triggers, starts a fresh workflow for the next iteration (daisy-chain)
*
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Restore explanatory comment about why daisy-chaining is used

Issue: The original commit included a comment explaining why daisy-chaining solves the accumulated delay problem (bounded event log, avoiding replay delay). This was removed, leaving only the "what" without the "why".

Why: Future maintainers may not understand why daisy-chaining was chosen over the simpler loop approach. The deleted comment provided critical context about the root cause and solution mechanism.

Fix: Add explanation to the header comment:

Suggested change
/**
* Workflow for running scheduled triggers.
* Workflow for running scheduled triggers using daisy-chaining.
*
* This workflow:
* 1. Gets or creates the next pending invocation
* 2. Sleeps until its scheduled time
* 3. Checks if the trigger is still enabled
* Each run executes a single iteration:
* 1. Checks if the trigger is still enabled (with adoption for chained runs)
* 2. Gets or creates the next pending invocation
* 3. Sleeps until its scheduled time
* 4. Executes the agent with retries
* 5. For cron triggers, loops back to step 1
* 5. For cron triggers, starts a fresh workflow for the next iteration (daisy-chain)
*
*/
/**
* Workflow for running scheduled triggers using daisy-chaining.
*
* Why daisy-chaining instead of a while loop?
* - Durable workflows maintain an event log for replay/recovery
* - With a loop, each iteration adds events; replay cost grows linearly
* - After ~17 hours of 30-min cron, delay grew from ~11s to ~2+ minutes
* - Daisy-chaining keeps each run's event log bounded (~15 events)
*
* Each run executes a single iteration:
* 1. Checks if the trigger is still enabled (with adoption for chained runs)
* 2. Gets or creates the next pending invocation
* 3. Sleeps until its scheduled time
* 4. Executes the agent with retries
* 5. For cron triggers, starts a fresh workflow for the next iteration (daisy-chain)
*
*/

Refs:

@github-actions github-actions Bot deleted a comment from claude Bot Mar 10, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(2) Total Issues | Risk: Low

Delta Review Context

This is a delta review scoped to 1 commit (d604a5f4d comments) since the last automated review. The delta consists of ~77 insertions addressing prior review feedback on error handling and logging.


✅ Prior Feedback Addressed

The following issues from the prior review have been correctly addressed in this delta:

Prior Issue Resolution
🟠 Major #2: Missing error handling for startNextIterationStep ✅ Added try-catch with logging at both chaining paths (lines 278-294 and 430-445)
🟠 Inline: Silent failure when resolvedRef is null ✅ Added warning log (lines 111-122)
🟠 Inline: Silent failure when workflow record not found ✅ Added warning log (lines 101-108)
🟠 Inline: Adoption DB update lacks error handling ✅ Added try-catch with error logging (scheduledTriggerSteps.ts lines 210-230)
🔴 pullfrog bug: lastError not cleared on cancellation ✅ Added lastError = null when cancellation detected (line 318)
🔧 Consistency: Logger name mismatch ✅ Renamed chainLoggerlogger

All error handling implementations follow the correct pattern: try-catch → log with context → re-throw for workflow framework retry.


🕐 Pending Recommendations (2)


Prior Issues Outside Delta Scope

The following issues from the prior review remain unaddressed but were not expected in this delta (they require additional implementation beyond error handling fixes):

Prior Issue Status
🟠 Major #3: Missing trace context propagation across daisy-chained runs Open - Consider adding rootTraceId to payload
🟠 Major #4: Critical test coverage gaps for new functionality Open - No tests added for new crash recovery logic
🟠 Major #1: TOCTOU race documentation Open - The race is acceptable given adoption; documenting invariants is optional
💭 Consider #1: Pre-chain enabled check redundancy Open - Optional performance consideration
💭 Consider #2: Adoption observability metrics Open - Optional observability improvement

✅ APPROVE

Summary: The delta correctly addresses all the error handling and logging gaps identified in the prior review. The implementations are consistent and follow best practices (try-catch → log → re-throw). The pullfrog bug with lastError not being cleared on cancellation has been fixed.

The remaining pending items (changeset message style, explanatory comment) are minor documentation concerns that don't block approval. The larger architectural items (test coverage, trace propagation) were flagged in the original review and remain relevant for follow-up but are not blocking for this fix.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator (delta) 6 0 0 0 0 2 0
Total 6 0 0 0 0 2 0

Note: Delta review focused on verifying prior feedback was correctly addressed. All 6 prior issues examined were resolved correctly.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

Delta Review Context

This is a delta review scoped to changes since the last automated review (commit d604a5f4d73854023f). The delta consists solely of:

73854023f Merge branch 'main' into update-triggers

The PR's actual changes are unchanged — the three files modified by this PR (scheduledTriggerRunner.ts, scheduledTriggerSteps.ts, and the changeset) have no diffs since the prior review. The merge from main brings in unrelated changes (context compression improvements, Nango updates, GenericKeyValueInput fix) that do not affect the scheduled trigger workflow.


🕐 Pending Recommendations (2)

These items were raised in prior reviews and remain open — they are minor documentation concerns that don't block approval:


✅ APPROVE

Summary: The delta since the prior APPROVED review consists only of a merge from main with no impact on the PR's scope. The daisy-chaining implementation correctly addresses the accumulated latency bug, and all prior error handling feedback was addressed in earlier commits. The two pending documentation items (changeset message style, explanatory comment) are nice-to-haves but don't block merging this fix.

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator (delta) 0 0 0 0 0 2 0
Total 0 0 0 0 0 2 0

Note: No sub-reviewers dispatched — delta consists only of a merge from main with no changes to the PR's actual scope.

@github-actions github-actions Bot deleted a comment from claude Bot Mar 10, 2026
@robert-inkeep robert-inkeep merged commit b1b440a into main Mar 10, 2026
11 checks passed
@robert-inkeep robert-inkeep deleted the update-triggers branch March 10, 2026 21:21
@inkeep
Copy link
Copy Markdown
Contributor

inkeep Bot commented Mar 10, 2026

No documentation updates required. The daisy-chaining implementation is an internal infrastructure improvement that fixes timing drift in cron scheduled triggers. This change does not affect any public APIs, SDK methods, or user-facing behavior—triggers will simply run more accurately on schedule.

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Mar 10, 2026

Ito Test Report ❌

18 test cases ran. 12 passed, 6 failed.

✅ The core CRUD and authorization flows for scheduled triggers were stable, but verification found multiple product defects in scheduler execution, cancellation consistency, one-time date handling, and run-now burst protection. 🔍 Code review across UI and API paths supports these as real application issues.

✅ Passed (12)
Test Case Summary Timestamp Screenshot
ROUTE-1 Created a new recurring scheduled trigger from the scheduled tab and verified valid navigation to invocation history and edit flow from row actions. 0:00 ROUTE-1_0-00.png
ROUTE-2 Updated the recurring trigger cadence to a different valid interval and confirmed the new schedule persisted after page reload. 0:00 ROUTE-2_0-00.png
ROUTE-3 Run Now succeeded from scheduled table, invocation history showed new recent invocation, and API run endpoint returned 200 with a new invocationId. 3:10 ROUTE-3_3-10.png
ROUTE-4 Cancel action succeeded from invocation history, cancelled status was visible and consistent across per-trigger and project-wide views, and cancel API returned 200 success. 3:10 ROUTE-4_3-10.png
ROUTE-5 Rerun from historical invocation succeeded, produced a new invocation entry while preserving the original invocation record, and rerun API returned 200 with distinct newInvocationId. 3:10 ROUTE-5_3-10.png
ROUTE-6 Deleted the scheduled trigger via row action and verified it no longer appeared after reload. 9:31 ROUTE-6_9-31.png
EDGE-1 Disabled the recurring trigger and verified disabled state persisted after reload without immediate execution evidence. 9:31 EDGE-1_9-31.png
EDGE-3 Executed run-now on the recurring trigger and deleted it immediately; the trigger row was removed from the scheduled table afterward. 9:31 EDGE-3_9-31.png
EDGE-5 In mobile viewport, core scheduled-trigger actions remained usable: run-now, toggle, and navigation to all invocations worked. 9:31 EDGE-5_9-31.png
ADV-2 After rapid toggle thrash and reload, persisted enabled state was deterministic and matched the final visible switch state. 28:01 ADV-2_28-01.png
ADV-3 Non-admin execution of run-now on a trigger configured with another user's runAsUserId was rejected with 403 Forbidden, and no unauthorized run was allowed. 36:23 ADV-3_36-23.png
ADV-4 Tampered tenant scope was denied consistently; API returned 403 and UI did not expose cross-scope trigger data. 29:43 ADV-4_29-43.png
❌ Failed (6)
Test Case Summary Timestamp Screenshot
LOGIC-1 Trigger cron expression is persisted as every minute, but repeated polling over multiple minute boundaries produced no cron-generated invocation sequence; only manual run entries were observed, so cadence stability criteria was not met. 25:20 LOGIC-1_25-20.png
LOGIC-2 Schedule update to every-minute persisted in UI, but follow-up polling showed no new scheduled invocations at all, preventing confirmation of supersession behavior and failing the expected post-update cadence execution check. 25:20 LOGIC-2_25-20.png
LOGIC-3 Cancel endpoint returned success for a running invocation, but the same invocation later resumed and failed, and no follow-on periodic invocation was created during the observation window; chain continuity behavior failed. 25:20 LOGIC-3_25-20.png
EDGE-2 One-time trigger creation accepted malformed scheduling state shown as Invalid Date, and duplicate navigation triggered an error boundary instead of safe duplication behavior. 9:31 EDGE-2_9-31.png
EDGE-4 Deep-link duplication path generated a malformed URL and triggered an application error boundary, indicating navigation robustness issues despite successful manual recovery to triggers. 9:31 EDGE-4_9-31.png
ADV-1 Burst run-now requests were accepted without deduplication; 10 rapid requests returned 200 and produced multiple concurrent manual-run invocations. 27:41 ADV-1_27-41.png
Cron cadence remains stable across chained iterations – Failed
  • Where: Scheduled trigger recurring execution path (workflow scheduling + UI default timezone input)

  • Steps to reproduce: Create a recurring trigger from UI defaults in an environment where browser timezone resolves to Etc/Unknown; wait across multiple cron boundaries.

  • What failed: No cron invocation is created despite * * * * * schedule; expected minute-by-minute invocations.

  • Code analysis: The UI stores browser timezone directly, and workflow cron parsing consumes that timezone without validation or fallback. With invalid timezone values, next execution calculation can throw and stop scheduling.

  • Relevant code:

    agents-manage-ui/src/components/scheduled-triggers/scheduled-trigger-form.tsx (lines 127-140)

    const getDefaultValues = (): ScheduledTriggerFormData => {
      // Get browser's timezone for new triggers
      const browserTimezone = Intl.DateTimeFormat().resolvedOptions().timeZone || 'UTC';
    
      if (!trigger) {
        const p = defaultsFromParams;
        return {
          scheduleType: (p?.scheduleType as 'cron' | 'one-time') || 'cron',
          cronExpression: p?.cronExpression || '',
          cronTimezone: p?.cronTimezone || browserTimezone,

    agents-api/src/domains/run/workflow/steps/scheduledTriggerSteps.ts (lines 65-76)

    if (cronExpression) {
      const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();
      const interval = CronExpressionParser.parse(cronExpression, {
        currentDate: baseDate,
        tz: cronTimezone || 'UTC',
      });
      const nextDate = interval.next();
      const nextIso = nextDate.toISOString();
      return { nextExecutionTime: nextIso, isOneTime: false };
    }
  • Why this is likely a bug: Invalid timezone values are accepted from UI into production scheduling logic without guardrails, causing cron iteration generation to fail instead of falling back safely.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 25:20

Superseded runner does not continue after schedule update – Failed
  • Where: Scheduled trigger update path for recurring cron workflows

  • Steps to reproduce: Update a recurring trigger to every-minute and observe invocations after the update.

  • What failed: No new scheduled invocations appear after update; expected continued minute cadence under the updated schedule.

  • Code analysis: Update flow reuses the same cron parsing path as normal scheduling. Because invalid cron timezone values are not normalized, post-update runner iterations can fail at next-time computation and never enqueue a new invocation.

  • Relevant code:

    agents-manage-ui/src/components/scheduled-triggers/scheduled-trigger-form.tsx (lines 137-140)

    return {
      scheduleType: (p?.scheduleType as 'cron' | 'one-time') || 'cron',
      cronExpression: p?.cronExpression || '',
      cronTimezone: p?.cronTimezone || browserTimezone,
      runAt: p?.runAt || '',
    };

    agents-api/src/domains/run/workflow/steps/scheduledTriggerSteps.ts (lines 66-71)

    const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();
    const interval = CronExpressionParser.parse(cronExpression, {
      currentDate: baseDate,
      tz: cronTimezone || 'UTC',
    });
    const nextDate = interval.next();
  • Why this is likely a bug: The update path can persist timezone values that the scheduler cannot execute, so schedule updates silently break recurring execution.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 25:20

Cancelled cron invocation still chains future iteration – Failed
  • Where: Manual run/cancel invocation lifecycle in scheduled trigger API

  • Steps to reproduce: Start a trigger run, immediately call cancel for that invocation, then poll invocation status.

  • What failed: Invocation transitions to cancelled and later returns to running/failed; expected cancellation to be terminal.

  • Code analysis: Cancel endpoint marks status cancelled, but background manual execution loop does not check cancellation state before continuing retries and status updates.

  • Relevant code:

    agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 923-928)

    // Mark as cancelled
    await markScheduledTriggerInvocationCancelled(runDbClient)({
      scopes: { tenantId, projectId, agentId },
      scheduledTriggerId,
      invocationId,
    });

    agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1473-1485)

    while (attemptNumber <= maxAttempts) {
      const conversationId = generateId();
      let success = false;
    
      try {
        await executeWithTimeout(conversationId);
        success = true;
      } catch (execErr) {
        lastError = execErr instanceof Error ? execErr.message : String(execErr);
        logger.error({ invocationId, attemptNumber, error: lastError }, 'Manual run failed with error');
      }
  • Why this is likely a bug: A cancelled invocation can still execute and overwrite state, violating cancel semantics and producing inconsistent lifecycle data.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 25:20

One-time trigger idempotency under repeated submit – Failed
  • Where: One-time trigger create/duplicate UI flow in scheduled trigger form

  • Steps to reproduce: Open one-time trigger form with malformed or non-normalized runAt value via duplicated/deep-linked params.

  • What failed: Form crashes with RangeError: Invalid time value instead of validating or safely recovering.

  • Code analysis: Date-time picker converts arbitrary value with new Date(value) and immediately formats it without validating the Date object.

  • Relevant code:

    agents-manage-ui/src/components/scheduled-triggers/date-time-picker.tsx (lines 29-31)

    // Parse value into date and time parts
    const dateValue = value ? new Date(value) : undefined;
    const timeValue = value ? value.slice(11, 16) : '09:00';

    agents-manage-ui/src/components/scheduled-triggers/date-time-picker.tsx (lines 73-75)

    const formattedDisplay = dateValue
      ? `${format(dateValue, 'PPP')} at ${format(dateValue, 'h:mm a')}`
      : placeholder;
  • Why this is likely a bug: User-controllable date input can crash rendering due to missing invalid-date guards, which is a production UI stability defect.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 9:31

Deep-link and non-linear navigation robustness – Failed
  • Where: Deep-link/duplicate path into scheduled trigger one-time form

  • Steps to reproduce: Use duplicate/deep-link navigation that carries a malformed one-time runAt value into the form.

  • What failed: Application enters error boundary (RangeError: Invalid time value) instead of rendering a controlled validation state.

  • Code analysis: The same DateTimePicker path renders formatted text from unvalidated Date values; navigation payload shape directly affects component stability.

  • Relevant code:

    agents-manage-ui/src/components/scheduled-triggers/date-time-picker.tsx (lines 30-31)

    const dateValue = value ? new Date(value) : undefined;
    const timeValue = value ? value.slice(11, 16) : '09:00';

    agents-manage-ui/src/components/scheduled-triggers/date-time-picker.tsx (lines 93-94)

    <CalendarIcon className="mr-2 h-4 w-4" />
    {dateValue ? format(dateValue, 'PPP') : 'Pick a date'}
  • Why this is likely a bug: Deep-link input should not be able to crash route rendering; this is a deterministic client-side robustness failure.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 9:31

Rapid Run now click-spam protection – Failed
  • Where: Scheduled trigger manual run API endpoint

  • Steps to reproduce: Submit multiple rapid run-now requests against the same trigger.

  • What failed: Each request creates a separate invocation and executes concurrently; expected dedupe/throttle protection for burst input.

  • Code analysis: Invocation idempotency key is hard-coded to include Date.now(), which guarantees uniqueness per request and prevents deduplication.

  • Relevant code:

    agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1365-1374)

    await createScheduledTriggerInvocation(runDbClient)({
      id: invocationId,
      tenantId,
      projectId,
      agentId,
      scheduledTriggerId,
      status: 'pending',
      scheduledFor: new Date().toISOString(),
      idempotencyKey: `manual-run-${scheduledTriggerId}-${Date.now()}`,
      attemptNumber: 1,
    });
  • Why this is likely a bug: The endpoint structurally cannot suppress duplicate bursts, so rapid UI/API retries always fan out into parallel runs.

  • Introduced by this PR: No – pre-existing bug (code not changed in this PR)

  • Timestamp: 27:41

📋 View Recording

Screen Recording

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants