Skip to content

AAP-70288 Catch errors on re-compute of schedule RRULE computed fields#16400

Open
AlanCoding wants to merge 5 commits intoansible:develfrom
AlanCoding:rrule_error
Open

AAP-70288 Catch errors on re-compute of schedule RRULE computed fields#16400
AlanCoding wants to merge 5 commits intoansible:develfrom
AlanCoding:rrule_error

Conversation

@AlanCoding
Copy link
Copy Markdown
Member

@AlanCoding AlanCoding commented Apr 9, 2026

SUMMARY

DST - daylight savings time

  • Fix unhandled ValueError in _fast_forward_rrule that causes HTTP 500 on the schedules page when an HOURLY rrule with BYHOUR constraints is fast-forwarded across a DST boundary
  • Harden awx_periodic_scheduler so one bad schedule cannot crash the task and block all other schedules from running

The _fast_forward_rrule optimization advances the dtstart of HOURLY/MINUTELY rrules close to the current time, computing in UTC then converting back to local timezone. With rrules whose BYHOUR values are aligned to a specific residue mod INTERVAL (e.g., BYHOUR=1,5,9,13,17,21 with INTERVAL=4 — all 1 mod 4), this alignment holds during EST (UTC-5) but breaks during EDT (UTC-4). The 1-hour DST shift changes the residue, dateutil finds no valid hours, and raises ValueError: Invalid rrule byxxx generates an empty set. This propagates uncaught through the timezone/until serializer properties on GET and through update_computed_fields in the periodic scheduler — crashing the schedules page and preventing all scheduled jobs from launching.

Bad rrules enter the database through two confirmed pathways: PATCH during EST (validation passes because fast-forward alignment holds in standard time, then breaks when EDT starts) and POST with a future dtstart (fast-forward is skipped entirely, then fails once dtstart passes and the clock is in EDT). Both are covered by new functional tests with deterministic mocked time.

The fix catches ValueError in _fast_forward_rrule and falls back to the original rrule (safe, since fast-forwarding is purely a performance optimization), adds defense-in-depth in update_computed_fields_no_save to set next_run = None for any unparseable rrule, and wraps both unprotected update_computed_fields() calls in awx_periodic_scheduler with try/except — matching the safe pattern already used in _run_dispatch_startup_common.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

Note

Medium Risk
Touches schedule recomputation and the periodic scheduler loop; while mostly defensive error handling, it can change how invalid schedules are persisted/ignored and could impact job launch timing if parsing behavior regresses.

Overview
Prevents schedule reads and the periodic scheduler from crashing when an rrule becomes unparsable (notably DST-triggered BYxxx empty-set errors).

Schedule.timezone/Schedule.until now catch ValueError from Schedule.rrulestr and return empty fallbacks, and awx_periodic_scheduler now wraps update_computed_fields() so a single bad schedule is logged, has next_run/dtend nulled, and is skipped without blocking other schedules.

Adds functional tests covering DST/BYHOUR failure scenarios (PATCH rejected with 400; GET stays 200 even if bad data exists in DB) and a task-level test ensuring the scheduler continues processing healthy schedules when one schedule recompute raises.

Reviewed by Cursor Bugbot for commit 719aeff. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Bug Fixes
    • Scheduler now logs and skips invalid or DST-sensitive recurrence rules instead of failing; affected schedules have their upcoming run cleared. PATCHing an invalid rule returns 400, while reads remain available (no server error).
  • Tests
    • Added functional tests ensuring the scheduler continues running healthy schedules across invalid recurrence rules and timezone transitions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6aff3d94-88b5-4af0-9fd5-07d1aa76adef

📥 Commits

Reviewing files that changed from the base of the PR and between 27164f0 and 719aeff.

📒 Files selected for processing (2)
  • awx/main/models/schedules.py
  • awx/main/tasks/system.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • awx/main/models/schedules.py
  • awx/main/tasks/system.py

📝 Walkthrough

Walkthrough

Adds defensive handling for ValueError from rrule parsing in the Schedule model and guards scheduler processing to log and skip or clear faulty schedules instead of aborting; includes tests for DST/rrule edge cases and scheduler resilience.

Changes

Cohort / File(s) Summary
Schedule model
awx/main/models/schedules.py
Wraps rrulestr usage in timezone and until to catch ValueError and return safe defaults (logging warnings). update_computed_fields_no_save now logs the error (including id and rrule) and re-raises ValueError.
Periodic scheduler task
awx/main/tasks/system.py
Protects Schedule.update_computed_fields() calls in two iteration loops with try/except: logs exceptions, forces an update clearing next_run/dtend for failing schedules, and continues in the active loop to skip further processing for that schedule.
API schedule tests
awx/main/tests/functional/api/test_schedules.py
Adds invalid RRULE constants and tests (PATCH/GET/POST) exercising EST/EDT DST fast-forward rrule failure paths using mocked time to ensure endpoints do not crash.
Scheduler resilience test
awx/main/tests/functional/tasks/test_tasks_system.py
Adds a functional test that patches Schedule.update_computed_fields to raise for one schedule, runs awx_periodic_scheduler(), and asserts the scheduler continues and only creates jobs for healthy schedules.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler
    participant ScheduleModel as Schedule
    participant Logger as Logger
    participant JobService as JobCreator

    Note over Scheduler,ScheduleModel: periodic run iterates schedules
    Scheduler->>ScheduleModel: call update_computed_fields()
    alt update_computed_fields succeeds
        ScheduleModel-->>Scheduler: computed fields updated
        Scheduler->>JobService: enqueue/create job if due
    else update_computed_fields raises ValueError
        ScheduleModel-->>Logger: log warning/error (includes schedule id/rrule)
        Logger-->>Scheduler: exception handled
        Scheduler->>ScheduleModel: force-update next_run=None, dtend=None
        Scheduler-->>Scheduler: skip further processing for this schedule (continue)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and specifically describes the main change: catching errors during recomputation of schedule RRULE computed fields, which is the core objective of this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AlanCoding AlanCoding requested a review from pb82 April 9, 2026 19:00
@AlanCoding AlanCoding changed the title AAP-70288 Fix 500 error when DST shifts BYHOUR alignment AAP-70288 Catch errors on re-compute of schedule RRULE computed fields Apr 9, 2026
@AlanCoding AlanCoding requested a review from djyasin April 9, 2026 19:30
@AlanCoding
Copy link
Copy Markdown
Member Author

I hit an issue, some RRULE values would get accepted by PATCH requests. I updated this so that those should return a 400 error, and this was a pre-existing bug.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/models/schedules.py`:
- Around line 329-341: The parse error for Schedule.rrulestr in
Schedule.update_computed_fields is currently swallowed after clearing next_run,
which allows a stale next_run to still be used by the scheduler; instead, after
logging and setting self.next_run and self.dtend to None, re-raise the
ValueError (or raise a new descriptive exception) so callers like
awx/main/tasks/system.py receive a failure signal and can skip/continue that
schedule; reference Schedule.update_computed_fields, Schedule.rrulestr and the
except ValueError block to locate where to change the behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff9755a8-f65c-4be0-88d0-f65199194c5f

📥 Commits

Reviewing files that changed from the base of the PR and between 24a4f99 and 75c9dbb.

📒 Files selected for processing (2)
  • awx/main/models/schedules.py
  • awx/main/tests/functional/api/test_schedules.py

Comment thread awx/main/models/schedules.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/models/schedules.py`:
- Around line 329-340: The except block for Schedule.rrulestr currently clears
in-memory self.next_run and self.dtend but re-raises before persisting; update
the handler in the Schedule class to set self.next_run = None and self.dtend =
None and then persist those changes (e.g., call self.save() or
self.save(update_fields=['next_run','dtend'])) before re-raising the ValueError
so the DB state reflects the cleared values and the broken schedule won't be
repeatedly reselected; ensure you still re-raise the original exception after
the save.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5af1687c-aef0-4437-a0c4-6e8dbb956cfa

📥 Commits

Reviewing files that changed from the base of the PR and between 75c9dbb and 27164f0.

📒 Files selected for processing (1)
  • awx/main/models/schedules.py

Comment on lines +329 to +340
try:
future_rs = Schedule.rrulestr(self.rrule)
except ValueError as e:
logger.error(
"Schedule id=%s has an invalid rrule that cannot be parsed: %s. Error: %s",
self.id,
self.rrule,
e,
)
self.next_run = None
self.dtend = None
raise
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persisted-state gap: next_run/dtend are cleared in memory but not saved before re-raise.

At Line 340, re-raising exits before any save, so scheduler catch-and-continue paths can leave stale DB values and repeatedly re-select the same broken schedule.

Proposed fix
 def update_computed_fields(self):
-    changed = self.update_computed_fields_no_save()
+    try:
+        changed = self.update_computed_fields_no_save()
+    except ValueError:
+        # Fields were nulled in update_computed_fields_no_save(); persist that state.
+        super(Schedule, self).save(update_fields=['next_run', 'dtend', 'modified'])
+        raise
     if not changed:
         return
     emit_channel_notification('schedules-changed', dict(id=self.id, group_name='schedules'))

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/models/schedules.py` around lines 329 - 340, The except block for
Schedule.rrulestr currently clears in-memory self.next_run and self.dtend but
re-raises before persisting; update the handler in the Schedule class to set
self.next_run = None and self.dtend = None and then persist those changes (e.g.,
call self.save() or self.save(update_fields=['next_run','dtend'])) before
re-raising the ValueError so the DB state reflects the cleared values and the
broken schedule won't be repeatedly reselected; ensure you still re-raise the
original exception after the save.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...seems valid, unfortunately.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanCoding, glad the concern checks out. Here's a clean fix plan:

  • Catch ValueError in update_computed_fields after update_computed_fields_no_save sets the fields to None, persist those cleared values, then re-raise so callers (scheduler) still see the error but the DB is consistent.

Failed to handle agent chat message. Please try again.

Comment thread awx/main/models/schedules.py
Copy link
Copy Markdown
Member

@fosterseth fosterseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 719aeff. Configure here.

utc = tzutc()
# All rules in a ruleset will have the same dtstart so we can just take the first rule
tzinfo = Schedule.rrulestr(self.rrule)._rrule[0]._dtstart.tzinfo
try:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing root-cause catch permanently disables valid schedules

High Severity

The PR description states the fix "catches ValueError in _fast_forward_rrule and falls back to the original rrule," but _fast_forward_rrule has no such catch around rrule.replace() at line 100. Without it, the ValueError propagates to the new handlers in awx_periodic_scheduler, which set next_run=None via direct DB update. Since next_run=None never matches the before() or between() query filters, the schedule is permanently disabled — even after DST ends and the fast-forward would succeed again. Because fast-forwarding is purely a performance optimization, catching ValueError in _fast_forward_rrule and returning the original rrule would let valid schedules keep running instead of being irreversibly killed.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 719aeff. Configure here.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
72.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@AlanCoding
Copy link
Copy Markdown
Member Author

Baby Yolo Results

Ran containerized Baby Yolo build #132 (Apr 20) and compared against reference build #129 (devel, same day).

  • 16 total failures in the patch build vs 13 in the reference — no regressions introduced by this PR.
  • 12 of the 16 failures are identical to the reference baseline (AzureAD ODataError, galaxy PermissionsEnum, lightspeed XDS timeout, hub e2e, etc.).
  • The 4 failures unique to this build are RBAC tests that hit "Project failed to enter a successful state" during setup — a known intermittent infrastructure flake, unrelated to the rrule changes.

Verdict: This PR does not introduce test regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants