Skip to content

✨ Refactor controller, handler errors, resync & remove telemetry#434

Merged
abahmed merged 4 commits intomainfrom
refactor/informer-handler-cleanup
Mar 28, 2026
Merged

✨ Refactor controller, handler errors, resync & remove telemetry#434
abahmed merged 4 commits intomainfrom
refactor/informer-handler-cleanup

Conversation

@abahmed
Copy link
Copy Markdown
Owner

@abahmed abahmed commented Mar 28, 2026

🔄 Replace watcher package with SharedInformerFactory + TypedRateLimitingInterface controller

  • controller/controller.go: standard informer event handlers, workqueue workers, cache sync
  • Removed watcher/start.go, watcher/watcher.go, watcher/watcher_test.go

⚠️ Handler methods now return error for transient failures

  • ProcessPod/ProcessNode/ProcessPodObject/ProcessNodeObject return error
  • Transient errors (cache misses, API failures) trigger requeue with backoff
  • Permanent outcomes (filtered, deleted, alert sent) return nil

🔁 Add configurable resync period (config.resyncSeconds)

  • Default 0 = event-driven only, no periodic resync
  • Ensures missed events are eventually re-processed

🧹 Remove telemetry subsystem entirely

  • Deleted telemetry/ package
  • Removed from config, state, startup, constant, deploy configs, docs
  • Removed IsTelemetrySent/MarkTelemetrySent from StateManager
  • Simplified StartupManager constructor (removed telemetryCfg param)

📝 Clean up unused Controller.client field, update deploy configs and READMEs

abahmed added 4 commits March 28, 2026 19:07
🔄 Replace watcher package with SharedInformerFactory + TypedRateLimitingInterface controller
   - controller/controller.go: standard informer event handlers, workqueue workers, cache sync
   - Removed watcher/start.go, watcher/watcher.go, watcher/watcher_test.go

⚠️ Handler methods now return error for transient failures
   - ProcessPod/ProcessNode/ProcessPodObject/ProcessNodeObject return error
   - Transient errors (cache misses, API failures) trigger requeue with backoff
   - Permanent outcomes (filtered, deleted, alert sent) return nil

🔁 Add configurable resync period (config.resyncSeconds)
   - Default 0 = event-driven only, no periodic resync
   - Ensures missed events are eventually re-processed

🧹 Remove telemetry subsystem entirely
   - Deleted telemetry/ package
   - Removed from config, state, startup, constant, deploy configs, docs
   - Removed IsTelemetrySent/MarkTelemetrySent from StateManager
   - Simplified StartupManager constructor (removed telemetryCfg param)

📝 Clean up unused Controller.client field, update deploy configs and READMEs
🧹 Probes removed from deploy/deploy.yaml and Helm chart deployment template
🗑️ Removed readinessProbe and livenessProbe config from values.yaml and chart README
✅ restartPolicy: Always kept (explicit for clarity)

Probes were pointless for kwatch:
- Readiness: no inbound traffic to route, no Service in deploy
- Liveness: health server goroutine runs independently of controller loop
  (stuck controller still gets 200 OK from /healthz)

K8s restarts containers automatically when the process exits.
Mock handler podKeys/podDel/nodeKeys/nodeDel slices were accessed
concurrently by the controller worker goroutine and the test goroutine
without synchronization, causing -race failures in CI.

Added sync.Mutex to mockHandler and thread-safe accessor methods
(podCount, nodeCount, podEntry, nodeEntry).
@abahmed abahmed merged commit 5014fcf into main Mar 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant