AAP-70288 Catch errors on re-compute of schedule RRULE computed fields by AlanCoding · Pull Request #16400 · ansible/awx

AlanCoding · 2026-04-09T18:59:08Z

SUMMARY

DST - daylight savings time

Fix unhandled ValueError in _fast_forward_rrule that causes HTTP 500 on the schedules page when an HOURLY rrule with BYHOUR constraints is fast-forwarded across a DST boundary
Harden awx_periodic_scheduler so one bad schedule cannot crash the task and block all other schedules from running

The _fast_forward_rrule optimization advances the dtstart of HOURLY/MINUTELY rrules close to the current time, computing in UTC then converting back to local timezone. With rrules whose BYHOUR values are aligned to a specific residue mod INTERVAL (e.g., BYHOUR=1,5,9,13,17,21 with INTERVAL=4 — all 1 mod 4), this alignment holds during EST (UTC-5) but breaks during EDT (UTC-4). The 1-hour DST shift changes the residue, dateutil finds no valid hours, and raises ValueError: Invalid rrule byxxx generates an empty set. This propagates uncaught through the timezone/until serializer properties on GET and through update_computed_fields in the periodic scheduler — crashing the schedules page and preventing all scheduled jobs from launching.

Bad rrules enter the database through two confirmed pathways: PATCH during EST (validation passes because fast-forward alignment holds in standard time, then breaks when EDT starts) and POST with a future dtstart (fast-forward is skipped entirely, then fails once dtstart passes and the clock is in EDT). Both are covered by new functional tests with deterministic mocked time.

The fix catches ValueError in _fast_forward_rrule and falls back to the original rrule (safe, since fast-forwarding is purely a performance optimization), adds defense-in-depth in update_computed_fields_no_save to set next_run = None for any unparseable rrule, and wraps both unprotected update_computed_fields() calls in awx_periodic_scheduler with try/except — matching the safe pattern already used in _run_dispatch_startup_common.

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

Note

Medium Risk
Touches schedule recomputation and the periodic scheduler loop; while mostly defensive error handling, it can change how invalid schedules are persisted/ignored and could impact job launch timing if parsing behavior regresses.

Overview
Prevents schedule reads and the periodic scheduler from crashing when an rrule becomes unparsable (notably DST-triggered BYxxx empty-set errors).

Schedule.timezone/Schedule.until now catch ValueError from Schedule.rrulestr and return empty fallbacks, and awx_periodic_scheduler now wraps update_computed_fields() so a single bad schedule is logged, has next_run/dtend nulled, and is skipped without blocking other schedules.

Adds functional tests covering DST/BYHOUR failure scenarios (PATCH rejected with 400; GET stays 200 even if bad data exists in DB) and a task-level test ensuring the scheduler continues processing healthy schedules when one schedule recompute raises.

^{Reviewed by Cursor Bugbot for commit 719aeff. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

Bug Fixes
- Scheduler now logs and skips invalid or DST-sensitive recurrence rules instead of failing; affected schedules have their upcoming run cleared. PATCHing an invalid rule returns 400, while reads remain available (no server error).
Tests
- Added functional tests ensuring the scheduler continues running healthy schedules across invalid recurrence rules and timezone transitions.

…on date

coderabbitai · 2026-04-09T18:59:36Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6aff3d94-88b5-4af0-9fd5-07d1aa76adef

📥 Commits

Reviewing files that changed from the base of the PR and between 27164f0 and 719aeff.

📒 Files selected for processing (2)

awx/main/models/schedules.py
awx/main/tasks/system.py

🚧 Files skipped from review as they are similar to previous changes (2)

awx/main/models/schedules.py
awx/main/tasks/system.py

📝 Walkthrough

Walkthrough

Adds defensive handling for ValueError from rrule parsing in the Schedule model and guards scheduler processing to log and skip or clear faulty schedules instead of aborting; includes tests for DST/rrule edge cases and scheduler resilience.

Changes

Cohort / File(s)	Summary
Schedule model `awx/main/models/schedules.py`	Wraps `rrulestr` usage in `timezone` and `until` to catch `ValueError` and return safe defaults (logging warnings). `update_computed_fields_no_save` now logs the error (including `id` and `rrule`) and re-raises `ValueError`.
Periodic scheduler task `awx/main/tasks/system.py`	Protects `Schedule.update_computed_fields()` calls in two iteration loops with try/except: logs exceptions, forces an update clearing `next_run`/`dtend` for failing schedules, and `continue`s in the active loop to skip further processing for that schedule.
API schedule tests `awx/main/tests/functional/api/test_schedules.py`	Adds invalid RRULE constants and tests (PATCH/GET/POST) exercising EST/EDT DST fast-forward rrule failure paths using mocked time to ensure endpoints do not crash.
Scheduler resilience test `awx/main/tests/functional/tasks/test_tasks_system.py`	Adds a functional test that patches `Schedule.update_computed_fields` to raise for one schedule, runs `awx_periodic_scheduler()`, and asserts the scheduler continues and only creates jobs for healthy schedules.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler
    participant ScheduleModel as Schedule
    participant Logger as Logger
    participant JobService as JobCreator

    Note over Scheduler,ScheduleModel: periodic run iterates schedules
    Scheduler->>ScheduleModel: call update_computed_fields()
    alt update_computed_fields succeeds
        ScheduleModel-->>Scheduler: computed fields updated
        Scheduler->>JobService: enqueue/create job if due
    else update_computed_fields raises ValueError
        ScheduleModel-->>Logger: log warning/error (includes schedule id/rrule)
        Logger-->>Scheduler: exception handled
        Scheduler->>ScheduleModel: force-update next_run=None, dtend=None
        Scheduler-->>Scheduler: skip further processing for this schedule (continue)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and specifically describes the main change: catching errors during recomputation of schedule RRULE computed fields, which is the core objective of this changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AlanCoding · 2026-04-09T19:48:17Z

I hit an issue, some RRULE values would get accepted by PATCH requests. I updated this so that those should return a 400 error, and this was a pre-existing bug.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/models/schedules.py`:
- Around line 329-341: The parse error for Schedule.rrulestr in
Schedule.update_computed_fields is currently swallowed after clearing next_run,
which allows a stale next_run to still be used by the scheduler; instead, after
logging and setting self.next_run and self.dtend to None, re-raise the
ValueError (or raise a new descriptive exception) so callers like
awx/main/tasks/system.py receive a failure signal and can skip/continue that
schedule; reference Schedule.update_computed_fields, Schedule.rrulestr and the
except ValueError block to locate where to change the behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff9755a8-f65c-4be0-88d0-f65199194c5f

📥 Commits

Reviewing files that changed from the base of the PR and between 24a4f99 and 75c9dbb.

📒 Files selected for processing (2)

awx/main/models/schedules.py
awx/main/tests/functional/api/test_schedules.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/models/schedules.py`:
- Around line 329-340: The except block for Schedule.rrulestr currently clears
in-memory self.next_run and self.dtend but re-raises before persisting; update
the handler in the Schedule class to set self.next_run = None and self.dtend =
None and then persist those changes (e.g., call self.save() or
self.save(update_fields=['next_run','dtend'])) before re-raising the ValueError
so the DB state reflects the cleared values and the broken schedule won't be
repeatedly reselected; ensure you still re-raise the original exception after
the save.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5af1687c-aef0-4437-a0c4-6e8dbb956cfa

📥 Commits

Reviewing files that changed from the base of the PR and between 75c9dbb and 27164f0.

📒 Files selected for processing (1)

awx/main/models/schedules.py

coderabbitai · 2026-04-10T00:13:25Z

+        try:
+            future_rs = Schedule.rrulestr(self.rrule)
+        except ValueError as e:
+            logger.error(
+                "Schedule id=%s has an invalid rrule that cannot be parsed: %s. Error: %s",
+                self.id,
+                self.rrule,
+                e,
+            )
+            self.next_run = None
+            self.dtend = None
+            raise


⚠️ Potential issue | 🟠 Major

Persisted-state gap: next_run/dtend are cleared in memory but not saved before re-raise.

At Line 340, re-raising exits before any save, so scheduler catch-and-continue paths can leave stale DB values and repeatedly re-select the same broken schedule.

Proposed fix

def update_computed_fields(self): - changed = self.update_computed_fields_no_save() + try: + changed = self.update_computed_fields_no_save() + except ValueError: + # Fields were nulled in update_computed_fields_no_save(); persist that state. + super(Schedule, self).save(update_fields=['next_run', 'dtend', 'modified']) + raise if not changed: return emit_channel_notification('schedules-changed', dict(id=self.id, group_name='schedules'))

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@awx/main/models/schedules.py` around lines 329 - 340, The except block for Schedule.rrulestr currently clears in-memory self.next_run and self.dtend but re-raises before persisting; update the handler in the Schedule class to set self.next_run = None and self.dtend = None and then persist those changes (e.g., call self.save() or self.save(update_fields=['next_run','dtend'])) before re-raising the ValueError so the DB state reflects the cleared values and the broken schedule won't be repeatedly reselected; ensure you still re-raise the original exception after the save.

...seems valid, unfortunately.

@AlanCoding, glad the concern checks out. Here's a clean fix plan:

Catch ValueError in update_computed_fields after update_computed_fields_no_save sets the fields to None, persist those cleared values, then re-raise so callers (scheduler) still see the error but the DB is consistent.

Failed to handle agent chat message. Please try again.

fosterseth

makes sense!

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 719aeff. Configure here.}

cursor · 2026-04-13T12:14:41Z

        utc = tzutc()
        # All rules in a ruleset will have the same dtstart so we can just take the first rule
-        tzinfo = Schedule.rrulestr(self.rrule)._rrule[0]._dtstart.tzinfo
+        try:


Missing root-cause catch permanently disables valid schedules

High Severity

The PR description states the fix "catches ValueError in _fast_forward_rrule and falls back to the original rrule," but _fast_forward_rrule has no such catch around rrule.replace() at line 100. Without it, the ValueError propagates to the new handlers in awx_periodic_scheduler, which set next_run=None via direct DB update. Since next_run=None never matches the before() or between() query filters, the schedule is permanently disabled — even after DST ends and the fast-forward would succeed again. Because fast-forwarding is purely a performance optimization, catching ValueError in _fast_forward_rrule and returning the original rrule would let valid schedules keep running instead of being irreversibly killed.

Additional Locations (2)

awx/main/tasks/system.py#L883-L888

awx/main/tasks/system.py#L898-L904

^{Reviewed by Cursor Bugbot for commit 719aeff. Configure here.}

sonarqubecloud · 2026-04-13T12:22:11Z

Quality Gate failed

Failed conditions
72.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

AlanCoding · 2026-04-21T13:33:00Z

Baby Yolo Results

Ran containerized Baby Yolo build #132 (Apr 20) and compared against reference build #129 (devel, same day).

16 total failures in the patch build vs 13 in the reference — no regressions introduced by this PR.
12 of the 16 failures are identical to the reference baseline (AzureAD ODataError, galaxy PermissionsEnum, lightspeed XDS timeout, hub e2e, etc.).
The 4 failures unique to this build are RBAC tests that hit "Project failed to enter a successful state" during setup — a known intermittent infrastructure flake, unrelated to the rrule changes.

Verdict: This PR does not introduce test regressions.

Fix 500 error possible with rrules that conditionally fail depending …

36ba8a6

…on date

github-actions Bot added the component:api label Apr 9, 2026

AlanCoding requested a review from pb82 April 9, 2026 19:00

Add test for periodic scheduler fix

24a4f99

AlanCoding requested a review from chrismeyersfsu April 9, 2026 19:29

AlanCoding changed the title ~~AAP-70288 Fix 500 error when DST shifts BYHOUR alignment~~ AAP-70288 Catch errors on re-compute of schedule RRULE computed fields Apr 9, 2026

AlanCoding requested a review from djyasin April 9, 2026 19:30

djyasin approved these changes Apr 9, 2026

View reviewed changes

Deny PATCH requests that give bad rrules

75c9dbb

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread awx/main/models/schedules.py Outdated

fix comment on stale next_run use

27164f0

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread awx/main/models/schedules.py

fosterseth approved these changes Apr 10, 2026

View reviewed changes

Address dead code comment, make sure task errors can null out next_run

719aeff

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Conversation

AlanCoding commented Apr 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

ISSUE TYPE

COMPONENT NAME

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

AlanCoding commented Apr 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlanCoding Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fosterseth left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 13, 2026

Choose a reason for hiding this comment

Missing root-cause catch permanently disables valid schedules

Uh oh!

sonarqubecloud Bot commented Apr 13, 2026

Quality Gate failed

Uh oh!

AlanCoding commented Apr 21, 2026

Baby Yolo Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlanCoding commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading

coderabbitai Bot Apr 10, 2026 •

edited

Loading