feat: ShutdownTelemetry, debug endpoint, goroutine-leak tests, and deployment docs by zishanjawed · Pull Request #4038 · gofiber/fiber

zishanjawed · 2026-01-28T09:10:44Z

Summary

ShutdownTelemetry struct — captures per-phase wall-clock timing (PreHooksDuration, GracefulCloseDuration, DrainDuration, PostHooksDuration, TotalDuration) and connection-disposition counts (InitialConns, DrainedConns, ForcedConns, WebSocketsClosed, SSEsClosed, TimedOut).
app.LastShutdownTelemetry() — lock-free accessor via atomic.Pointer; returns nil before the first ShutdownWithConfig completes, or a fully-populated snapshot afterward.
app.ShutdownDebugHandler() — returns a Handler that responds with JSON containing status (running / shutting_down / shutdown), activeConnections, and the last telemetry snapshot with human-readable duration strings. Register at any path: app.Get("/debug/shutdown", app.ShutdownDebugHandler()).
GracefulCloseTyped — extended to accept *ShutdownTelemetry and increment WebSocketsClosed / SSEsClosed counters sequentially inside the sync.Map.Range (no additional synchronization needed).

Tests (9 new)

#	Test	Verifies
1	`Test_Telemetry_CleanDrain_PopulatesAllFields`	Clean shutdown populates all fields; `DrainedConns == InitialConns`; `ForcedConns == 0`; `TimedOut == false`
2	`Test_Telemetry_ForcedClose_CountsCorrectly`	Slow handler + short deadline; `ForcedConns >= 1`; `TimedOut == true`
3	`Test_Telemetry_HookDurations_Measured`	Pre-hook sleeps 50 ms; `PreHooksDuration >= 50 ms`
4	`Test_Telemetry_WebSocketSSE_Counters`	1 WS + 1 SSE marked; after shutdown `WebSocketsClosed == 1`, `SSEsClosed == 1`
5	`Test_DebugEndpoint_ReturnsRunningStatus`	Before shutdown: `status == "running"`, `lastShutdown` is `null`
6	`Test_DebugEndpoint_ReturnsTelemetryAfterShutdown`	After shutdown: `status == "shutdown"`, all duration strings parseable by `time.ParseDuration`
7	`Test_Telemetry_NoGoroutineLeak_CleanShutdown`	Clean shutdown does not leak goroutines
8	`Test_Telemetry_NoGoroutineLeak_ForcedClose`	Force-close shutdown does not leak goroutines
9	`Test_Telemetry_NoGoroutineLeak_DebugHandler`	10 repeated debug-handler calls do not leak goroutines

Documentation (`docs/guide/shutdown.md`)

Full reference covering:

Quick-start signal-handling example
ShutdownConfig field reference with defaults
ShutdownTelemetry struct and accessor
Debug-endpoint registration and JSON response shapes for all three states
Lifecycle phase order table (phases 1–9) mapped to telemetry fields
Hook usage: OnPreShutdown, OnPostShutdown, per-connection SetCleanupHook, multi-hook registration order
Kubernetes integration: annotated Deployment YAML (terminationGracePeriodSeconds, preStop sleep, liveness/readiness probes), full Go application wiring, alignment rules table
Load-balancer integration: AWS ALB/NLB deregistration delay, GCP connection draining, generic architecture diagram
Troubleshooting: connections-never-drain, hooks firing zero/twice, telemetry nil, WS/SSE counters stuck at zero, timeout mismatches, debug endpoint stuck at shutting_down
Concurrency-safety rationale table

Race-detector note

The race detector flags app.server.IdleTimeout = time.Nanosecond (phase 2 in both ShutdownWithContext and ShutdownWithConfig) racing with fasthttp's serveConn read of the same field. This is a pre-existing race in the upstream codebase — confirmed by running -race against Test_ShutdownWithConfig_OnShutdownStart (a test we did not modify). Our telemetry instrumentation introduces zero new data races.

Test plan

# Build
go build ./...

# Targeted feature + leak tests
go test -v -run "Test_Telemetry_|Test_DebugEndpoint_" -count=1 -timeout 60s .

# Full shutdown regression
go test -run "Test_.*[Ss]hutdown|Test_ActiveConnections|Test_IsShuttingDown|Test_ConnTracking|Test_PostShutdownHooks|Test_GracefulShutdown|Test_MultipleShutdown|Test_ShutdownWithC|Test_RequestContext_|Test_GracefulClose_|Test_CleanShutdown|Test_DrainTime|Test_ContextCancellation|Test_RepeatedShutdown|Test_HookOrder|Test_Integration_|Test_Telemetry_|Test_DebugEndpoint_" -count=1 -timeout 120s .

# Linting
go vet ./...
gofmt -l app.go listen.go shutdown_test.go

🤖 Generated with Claude Code

Adds per-phase timing and connection-disposition metrics collected during ShutdownWithConfig, exposed through both a programmatic accessor and a JSON debug handler. Includes comprehensive documentation covering configuration, Kubernetes deployment patterns, load-balancer integration, hook ordering, and a troubleshooting guide. Key additions: - ShutdownTelemetry struct capturing timing (drain, hooks, graceful-close) and counts (drained, forced, WS/SSE closed) - app.LastShutdownTelemetry() accessor (atomic read of last snapshot) - app.ShutdownDebugHandler() returning JSON with status + telemetry - GracefulCloseTyped now increments WS/SSE counters on the telemetry - 6 feature tests + 3 goroutine-leak tests - docs/guide/shutdown.md: full reference including K8s manifests, LB drain patterns, hook examples, and troubleshooting Co-Authored-By: Claude (penguin) <[email protected]>

coderabbitai · 2026-01-28T09:11:06Z

Walkthrough

Introduces a comprehensive graceful shutdown system with connection tracking, protocol-aware close logic (WebSocket/SSE), configurable lifecycle hooks, drain workflow, and telemetry collection. Adds new types for tracking connections, shutdown configuration, and shutdown metrics. Updates context methods to propagate shutdown state and adds three new error sentinels for shutdown and protocol scenarios.

Changes

Cohort / File(s)	Summary
Core Shutdown Lifecycle `src/app.go`	Adds atomic live-connection counter, dedicated shutdown context, ShutdownWithContext/ShutdownWithConfig methods, shutdown state tracking, and configurable hooks (OnShutdownStart, OnDrainProgress, OnForceClose, OnPreShutdown, OnPostShutdown). Introduces ShutdownDebug endpoint and telemetry accessors (IsShuttingDown, ActiveConnections, LastShutdownTelemetry).
Connection Tracking & Protocol Handling `src/listen.go`	Introduces ConnType enum, TrackedConn interface, and ShutdownConfig/ShutdownTelemetry types. Implements connection registry with per-connection cleanup hooks. Adds protocol-aware graceful close for WebSocket (close frame + ack timeout) and SSE (shutdown event + wait). Wraps listener with connTrackingListener for active connection management and force-close behavior.
Context Propagation `src/ctx.go`, `src/ctx_interface_gen.go`	Updates DefaultCtx methods (Deadline, Done, Err, Context) to reflect live application shutdown context instead of cached/no-op behavior. Adds TrackedConn() method to access connection-specific lifecycle controls. Makes Err() return ErrRequestShutdown on cancellation.
Error Definitions `src/error.go`	Adds three new sentinel errors: ErrRequestShutdown (shutdown signal), ErrWebSocketCloseTimeout (WS handshake timeout), ErrSSECloseWriteFailed (SSE write failure).
Documentation `docs/guide/shutdown.md`	New comprehensive guide covering shutdown lifecycle phases, three entry points, ShutdownConfig structure, telemetry model, JSON debug endpoint, Kubernetes/load-balancer integration, troubleshooting scenarios, and concurrency notes.

Sequence Diagram

sequenceDiagram
    autonumber
    participant App as App
    participant Listener as Listener
    participant Drain as Drain Monitor
    participant Conns as Active Connections
    participant WS as WebSocket
    participant SSE as SSE

    App->>App: ShutdownWithContext() called
    App->>App: Emit OnShutdownStart(activeConns)
    App->>Drain: Start drain phase with DrainInterval
    
    Drain->>Conns: Poll remaining connections
    Drain->>App: Emit OnDrainProgress(remaining, elapsed)
    Conns-->>Drain: Return count

    rect rgba(100, 200, 100, 0.5)
    Note over App,SSE: Graceful Close Phase
    App->>WS: Close WebSocket connections
    WS->>WS: Send close frame
    WS->>WS: Wait for client ack (timeout)
    WS-->>App: OnWebSocketClose callback
    
    App->>SSE: Close SSE connections
    SSE->>SSE: Write shutdown event
    SSE->>SSE: Wait for disconnect
    SSE-->>App: OnSSEClose callback
    end

    App->>App: OnForceClose if timeout exceeded
    App->>Conns: CloseAll remaining
    
    rect rgba(100, 100, 200, 0.5)
    Note over App: Post-Shutdown Hooks
    App->>App: Execute OnPostShutdown
    App->>App: Populate ShutdownTelemetry
    end

    App-->>Listener: Graceful shutdown complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The changes introduce intricate logic across multiple domains: atomic connection tracking with protocol-aware graceful close sequences, context propagation with lifecycle awareness, configurable multi-phase shutdown with hooks and telemetry collection, and handler-specific cleanup for WebSocket/SSE. The heterogeneous nature (spanning app lifecycle, connection management, context semantics, and error handling) and density of new types/methods (ShutdownConfig, ShutdownTelemetry, TrackedConn, protocol enums) demand careful verification of correctness, edge cases, and integration points.

Possibly related PRs

🔥 feat: Improve and Optimize ShutdownWithContext Func #3162 — Modifies shutdown flow and hook handling (OnShutdown splitting and listen/shutdown wiring alignment).
feat: fiber.Context implement context.Context #3382 — Modifies context-related API on DefaultCtx/Ctx (Deadline, Done, Err, Context behavior and interface changes).
🔥 feat: Add support for graceful shutdown timeout in ListenConfig #3220 — Modifies graceful-shutdown flow (Listen/ListenConfig and ShutdownWithContext/ShutdownWithTimeout integration).

Suggested reviewers

sixcolors
ReneWerner87

Poem

🐰 A hop, a drain, a graceful flow,
Connections close with gentle show,
WebSockets bid farewell with care,
Telemetry tracks each shutdown prayer,
Fiber shuts down, clean and fair! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main additions: ShutdownTelemetry, debug endpoint, goroutine-leak tests, and deployment docs.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering summary, tests, documentation, race-detector notes, and test plan. It follows the template structure with detailed information on all major changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-28T09:11:58Z

Summary of Changes

Hello @zishanjawed, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive graceful shutdown mechanism with detailed telemetry, a debug endpoint, and extensive documentation. It enhances the application's ability to shut down gracefully, providing insights into the process and ensuring no resource leaks. The changes include a new ShutdownTelemetry struct, updates to connection handling, and new tests to validate the shutdown process.

Highlights

Shutdown Telemetry: Introduces a ShutdownTelemetry struct to capture per-phase timing and connection-disposition counts during shutdown, providing insights into the shutdown process.
Debug Endpoint: Adds a /debug/shutdown endpoint that returns JSON containing the server's status, active connections, and the last shutdown telemetry snapshot, aiding in monitoring and debugging.
Graceful Close Extension: GracefulCloseTyped is extended to handle WebSocket and SSE connections, ensuring they are gracefully closed during shutdown, with counters for tracking.
Goroutine Leak Tests: Includes nine new tests to verify that clean and forced shutdowns, as well as repeated debug-handler calls, do not leak goroutines, ensuring stability.
Documentation: Comprehensive documentation (docs/guide/shutdown.md) covers various aspects of graceful shutdown, including configuration, telemetry, debug endpoint, lifecycle phases, hooks, Kubernetes integration, load balancer integration, and troubleshooting.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@app.go`:
- Around line 1200-1207: The write/read of app.shutdownCtx must be made atomic:
change the field type from context.Context to atomic.Pointer[context.Context]
(importing sync/atomic and unsafe), update the writer in the init/swap site
(where cfg.RequestContext is assigned under app.mutex.Lock()) to use
(*atomic.Pointer[context.Context]).Store((*context.Context)(unsafe.Pointer(&cfg.RequestContext)))
or equivalent atomic.StorePointer wrapping the cfg.RequestContext value, and
update reads (e.g., in ctx.go where code does "if parent := c.app.shutdownCtx;
parent != nil") to use atomic.LoadPointer and convert back to *context.Context
for nil-checking and use; ensure all places that previously accessed
app.shutdownCtx directly are replaced with atomic loads/stores so concurrent
access is race-free.

In `@ctx.go`:
- Around line 149-154: The read of c.app.shutdownCtx in ctx.go is a potential
data race against the write in ShutdownWithConfig (where RequestContext replaces
it); make shutdownCtx an atomic pointer (e.g., atomic.Pointer[context.Context]
or atomic.Value) on the App struct and update all accesses: replace direct reads
of c.app.shutdownCtx with an atomic load (e.g., shutdownCtx.Load()) and replace
the write in ShutdownWithConfig with an atomic store
(shutdownCtx.Store(newCtx)), keeping the same nil-check semantics so the
function returns the loaded context if non-nil or context.Background()
otherwise.
- Around line 177-185: DefaultCtx.Err currently replaces any context error with
ErrRequestShutdown, losing deadline/cancellation details; change Err to fetch
ctxErr := c.Context().Err() and return ctxErr unless the cancellation is
specifically due to server shutdown—i.e., detect shutdown (use the existing
shutdown signal/state available on DefaultCtx such as a shutdown flag, shutdown
channel or shutdown context) and only then return ErrRequestShutdown; reference
DefaultCtx.Err, c.Context().Err(), ErrRequestShutdown and RequestDeadline when
making the conditional mapping.

In `@docs/guide/shutdown.md`:
- Around line 50-54: The example reads telemetry too early: calling
app.LastShutdownTelemetry() before ShutdownWithConfig runs always returns nil;
move or remove that block so telemetry is only accessed after ShutdownWithConfig
completes (i.e., call LastShutdownTelemetry() after the ShutdownWithConfig call
and check for nil before logging), referencing the LastShutdownTelemetry() and
ShutdownWithConfig functions to locate and update the example.

🧹 Nitpick comments (6)

docs/guide/shutdown.md (3)
266-268: Add language specifier to fenced code block.

The static analysis tool flagged this code block as missing a language specifier (MD040). Consider adding text or an appropriate language identifier.
Suggested fix
-```
+```text
 OnShutdownStart → pre-hooks → GracefulCloseTyped → drain → force-close → post-hooks → return
</details>

---

`487-495`: **Add language specifier to fenced code block.**

This ASCII diagram code block is missing a language specifier (MD040). Consider adding `text` for plain text content.



<details>
<summary>Suggested fix</summary>

```diff
-```
+```text
 ┌─────────────┐   stop routing   ┌────────────────┐  drain timeout  ┌──────────┐
 │ Load Balancer│ ─────────────>   │  Fiber Instance │ ──────────────> │  Done    │
560-562: Add language specifier to fenced code block.

This code block showing the timeout relationship is missing a language specifier (MD040).
Suggested fix
-```
+```text
 terminationGracePeriodSeconds  <  preStop + ShutdownTimeout + hook_time
</details>

</blockquote></details>
<details>
<summary>app.go (1)</summary><blockquote>

`1376-1394`: **Minor: Double atomic load of `lastTelemetry`.**

The handler loads `lastTelemetry` twice (lines 1380 and 1390). While functionally correct due to atomic guarantees, a single load would be slightly more efficient.



<details>
<summary>Suggested optimization</summary>

```diff
 func (app *App) ShutdownDebugHandler() Handler {
 	return func(c Ctx) error {
 		status := "running"
+		tel := app.lastTelemetry.Load()
 		if app.IsShuttingDown() {
-			if app.lastTelemetry.Load() != nil {
+			if tel != nil {
 				status = "shutdown"
 			} else {
 				status = "shutting_down"
 			}
 		}
 
 		resp := shutdownDebugResponse{
 			Status:            status,
 			ActiveConnections: app.ActiveConnections(),
-			LastShutdown:      telemetryToJSON(app.lastTelemetry.Load()),
+			LastShutdown:      telemetryToJSON(tel),
 		}
 		return c.JSON(resp)
 	}
 }
listen.go (2)
380-389: Close() may call underlying Conn.Close() multiple times.

When CloseAll() wins the CAS race, it calls tc.Conn.Close(). If the handler subsequently calls tc.Close(), the CAS fails but c.Conn.Close() is still called (line 388). While double-close on net.Conn is generally safe (returns an error on second call), this behavior should be documented or guarded.
Optional: Guard the underlying close
 func (c *connTrackingConn) Close() error {
 	if atomic.CompareAndSwapInt32(&c.closed, 0, 1) {
 		atomic.AddInt64(c.activeConns, -1)
 		c.registry.Delete(c.id)
+		return c.Conn.Close()
 	}
-	return c.Conn.Close()
+	return nil // Already closed by CloseAll or previous Close call
 }
420-467: WebSocket close handshake reads only 2 bytes, which may not fully consume the client's close frame.

A WebSocket close frame from the client is at minimum 2 bytes (opcode + length) but typically 4+ bytes (with status code). Reading only 2 bytes leaves data in the buffer. While this doesn't affect correctness (the connection is about to close), reading a larger buffer would be more robust.
Suggested improvement
-	var buf [2]byte
+	var buf [128]byte // Accommodate full close frame with reason
 	_, readErr := tc.Conn.Read(buf[:])

coderabbitai · 2026-01-28T09:15:22Z

+	// --- phase 3c: swap request-level parent context ---------------------------
+	// If the caller supplied a RequestContext, replace the app's shutdown
+	// context so that every subsequent c.Context() derivation inherits the
+	// caller's deadline.  This lets handlers observe a concrete deadline via
+	// c.Deadline() in addition to the immediate Done() signal.
+	if cfg.RequestContext != nil {
+		app.shutdownCtx = cfg.RequestContext
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's check the current shutdownCtx usage in ctx.go rg -n 'shutdownCtx' ctx.go -C 5

Repository: gofiber/fiber

Length of output: 1044

🏁 Script executed:

#!/bin/bash # Check how Context() method is implemented in ctx.go rg -n -A 10 'func.*Context\(\)' ctx.go

Repository: gofiber/fiber

Length of output: 481

🏁 Script executed:

#!/bin/bash # Search for all shutdownCtx accesses across the codebase rg -n 'shutdownCtx' --type go

Repository: gofiber/fiber

Length of output: 1493

🏁 Script executed:

#!/bin/bash # Check mutex usage patterns around shutdownCtx rg -n 'mu.*Lock\|shutdownCtx' app.go | head -50

Repository: gofiber/fiber

Length of output: 39

🏁 Script executed:

#!/bin/bash # Get more context around line 1200-1207 in app.go to see the function and mutex usage sed -n '1180,1220p' app.go

Repository: gofiber/fiber

Length of output: 1850

🏁 Script executed:

#!/bin/bash # Look for the function containing line 1206 and check for mutex locks rg -n -B 30 'app.shutdownCtx = cfg.RequestContext' app.go | head -60

Repository: gofiber/fiber

Length of output: 1638

🏁 Script executed:

#!/bin/bash # Find the function definition containing line 1206 sed -n '1140,1210p' app.go | head -80

Repository: gofiber/fiber

Length of output: 3073

🏁 Script executed:

#!/bin/bash # Search for the function name around line 1206 rg -n -B 50 'app.shutdownCtx = cfg.RequestContext' app.go | grep 'func ' | tail -1

Repository: gofiber/fiber

Length of output: 145

Use atomic.Pointer[context.Context] for shutdownCtx to prevent data race.

The assignment at line 1206 is protected by app.mutex.Lock(), but the read in ctx.go line 151 (if parent := c.app.shutdownCtx; parent != nil) accesses shutdownCtx without acquiring the mutex. This creates a data race under Go's memory model: a guarded write paired with an unguarded read. Although the unsynchronized access is intentional (to make the swap immediately visible during shutdown), it should be protected using atomic.Pointer[context.Context] to ensure safe concurrent access.

🤖 Prompt for AI Agents

In `@app.go` around lines 1200 - 1207, The write/read of app.shutdownCtx must be made atomic: change the field type from context.Context to atomic.Pointer[context.Context] (importing sync/atomic and unsafe), update the writer in the init/swap site (where cfg.RequestContext is assigned under app.mutex.Lock()) to use (*atomic.Pointer[context.Context]).Store((*context.Context)(unsafe.Pointer(&cfg.RequestContext))) or equivalent atomic.StorePointer wrapping the cfg.RequestContext value, and update reads (e.g., in ctx.go where code does "if parent := c.app.shutdownCtx; parent != nil") to use atomic.LoadPointer and convert back to *context.Context for nil-checking and use; ensure all places that previously accessed app.shutdownCtx directly are replaced with atomic loads/stores so concurrent access is race-free.

coderabbitai · 2026-01-28T09:15:22Z

+	// Return the live shutdown context without caching so that a
+	// RequestContext swap during shutdown is immediately visible.
+	if parent := c.app.shutdownCtx; parent != nil {
+		return parent
+	}
+	return context.Background()


⚠️ Potential issue | 🟠 Major

Potential data race reading app.shutdownCtx without synchronization.

This read of app.shutdownCtx is unsynchronized with the write in app.go line 1206 (ShutdownWithConfig where RequestContext replaces it). While the mutex is held during the write, this read does not acquire the mutex.

This relates to the issue flagged in app.go — consider using an atomic.Pointer for shutdownCtx.

🤖 Prompt for AI Agents

In `@ctx.go` around lines 149 - 154, The read of c.app.shutdownCtx in ctx.go is a potential data race against the write in ShutdownWithConfig (where RequestContext replaces it); make shutdownCtx an atomic pointer (e.g., atomic.Pointer[context.Context] or atomic.Value) on the App struct and update all accesses: replace direct reads of c.app.shutdownCtx with an atomic load (e.g., shutdownCtx.Load()) and replace the write in ShutdownWithConfig with an atomic store (shutdownCtx.Store(newCtx)), keeping the same nil-check semantics so the function returns the loaded context if non-nil or context.Background() otherwise.

coderabbitai · 2026-01-28T09:15:22Z

+// Err returns nil while the request context is still active.  Once shutdown
+// begins and the context is cancelled, Err returns ErrRequestShutdown so
+// handlers can distinguish a shutdown cancellation from other errors.
+func (c *DefaultCtx) Err() error {
+	if err := c.Context().Err(); err != nil {
+		return ErrRequestShutdown
+	}
 	return nil
 }


⚠️ Potential issue | 🟡 Minor

Err() masks the original context error, losing deadline information.

When Context().Err() returns context.DeadlineExceeded (e.g., from RequestDeadline), this method still returns ErrRequestShutdown. Handlers cannot distinguish between shutdown cancellation and per-request deadline expiration.

Consider preserving the original error or checking specifically for shutdown:

Suggested approach

func (c *DefaultCtx) Err() error { - if err := c.Context().Err(); err != nil { - return ErrRequestShutdown - } + err := c.Context().Err() + if err == nil { + return nil + } + // If the app is shutting down, return the shutdown sentinel. + // Otherwise, return the actual context error (e.g., DeadlineExceeded). + if c.app.IsShuttingDown() { + return ErrRequestShutdown + } + return err return nil }

🤖 Prompt for AI Agents

In `@ctx.go` around lines 177 - 185, DefaultCtx.Err currently replaces any context error with ErrRequestShutdown, losing deadline/cancellation details; change Err to fetch ctxErr := c.Context().Err() and return ctxErr unless the cancellation is specifically due to server shutdown—i.e., detect shutdown (use the existing shutdown signal/state available on DefaultCtx such as a shutdown flag, shutdown channel or shutdown context) and only then return ErrRequestShutdown; reference DefaultCtx.Err, c.Context().Err(), ErrRequestShutdown and RequestDeadline when making the conditional mapping.

coderabbitai · 2026-01-28T09:15:22Z

+        tel := app.LastShutdownTelemetry()
+        if tel != nil {
+            log.Printf("Shutdown completed in %s (drained=%d, forced=%d)",
+                tel.TotalDuration, tel.DrainedConns, tel.ForcedConns)
+        }


⚠️ Potential issue | 🟡 Minor

Telemetry access before shutdown completes will always return nil.

The quick start example accesses app.LastShutdownTelemetry() before ShutdownWithConfig is called. At this point, telemetry is always nil because it's only populated after shutdown completes. This code block should be removed or moved after the shutdown call.

Suggested fix

go func() { <-quit - tel := app.LastShutdownTelemetry() - if tel != nil { - log.Printf("Shutdown completed in %s (drained=%d, forced=%d)", - tel.TotalDuration, tel.DrainedConns, tel.ForcedConns) - } - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tel := app.LastShutdownTelemetry()

if tel != nil {

log.Printf("Shutdown completed in %s (drained=%d, forced=%d)",

tel.TotalDuration, tel.DrainedConns, tel.ForcedConns)

}

go func() {

<-quit

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)

defer cancel()

🤖 Prompt for AI Agents

In `@docs/guide/shutdown.md` around lines 50 - 54, The example reads telemetry too early: calling app.LastShutdownTelemetry() before ShutdownWithConfig runs always returns nil; move or remove that block so telemetry is only accessed after ShutdownWithConfig completes (i.e., call LastShutdownTelemetry() after the ShutdownWithConfig call and check for nil before logging), referencing the LastShutdownTelemetry() and ShutdownWithConfig functions to locate and update the example.

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-designed graceful shutdown mechanism, complete with detailed telemetry, a debug endpoint, and extensive documentation. While the implementation aims to improve server robustness, critical security and stability issues were identified. Specifically, there is a potential Denial of Service (DoS) in the shutdown process due to sequential connection closing in GracefulCloseTyped, and concurrency issues in ShutdownWithConfig that allow for multiple hook executions and a data race on the application's shutdown context. Additionally, there are a few suggestions to enhance API consistency and code clarity.

gemini-code-assist · 2026-01-28T09:19:03Z

+func (l *connTrackingListener) GracefulCloseTyped(ctx context.Context, cfg *ShutdownConfig, tel *ShutdownTelemetry) {
+	l.conns.Range(func(_, value any) bool {
+		tc, ok := value.(*connTrackingConn)
+		if !ok {
+			return true
+		}
+
+		switch tc.ConnType() {
+		case ConnTypeWebSocket:
+			l.closeWebSocket(ctx, tc, cfg)
+			tel.WebSocketsClosed++
+		case ConnTypeSSE:
+			l.closeSSE(ctx, tc, cfg)
+			tel.SSEsClosed++
+		default:
+			// Plain HTTP — run cleanup hook if registered, but no
+			// protocol-specific close logic.
+			_ = tc.runCleanup()
+		}
+		return true
+	})
+}


The GracefulCloseTyped function iterates over all tracked connections sequentially using sync.Map.Range. For each WebSocket or SSE connection, it performs network I/O and waits for a client response or a timeout (defaulting to 5s for WebSockets and 2s for SSE). If a server has a large number of such connections, the shutdown process will be extremely slow, potentially exceeding the system's termination grace period (e.g., in Kubernetes) and preventing post-shutdown hooks from running. An attacker can exploit this by opening many long-lived connections to delay or disrupt the shutdown process, leading to a Denial of Service (DoS) on the shutdown lifecycle.

gemini-code-assist · 2026-01-28T09:19:03Z

+	app.mutex.Lock()

 	if app.server == nil {
+		app.mutex.Unlock()
 		return ErrNotRunning
 	}

-	// Execute the Shutdown hook
+	tel := &ShutdownTelemetry{}
+	tel.StartedAt = time.Now()
+	tel.InitialConns = int(atomic.LoadInt64(&app.activeConns))
+
+	// --- phase 1: mark state & signal in-flight requests ----------------------
+	atomic.StoreInt32(&app.shuttingDown, 1)
+	if app.shutdownCancel != nil {
+		app.shutdownCancel()
+	}
+
+	// --- phase 2: evict idle keepalives ----------------------------------------
+	// A very small IdleTimeout causes fasthttp to close connections that are
+	// waiting for a new request on the next internal tick, without waiting for
+	// the (potentially unbounded) user-configured IdleTimeout.
+	app.server.IdleTimeout = time.Nanosecond
+
+	// --- phase 3: stop accepting new connections --------------------------------
+	// Closing the tracking listener immediately rejects every subsequent Dial /
+	// Accept.  Connections already accepted continue to be served.
+	if app.trackingListener != nil {
+		_ = app.trackingListener.Close()
+	}
+
+	// --- phase 3b: apply per-request deadline -----------------------------------
+	// RequestDeadline adds an additional bound on top of the caller's context.
+	// The effective deadline is whichever expires first.
+	if cfg.RequestDeadline > 0 {
+		var cancel context.CancelFunc
+		ctx, cancel = context.WithTimeout(ctx, cfg.RequestDeadline)
+		defer cancel()
+	}
+
+	// --- phase 3c: swap request-level parent context ---------------------------
+	// If the caller supplied a RequestContext, replace the app's shutdown
+	// context so that every subsequent c.Context() derivation inherits the
+	// caller's deadline.  This lets handlers observe a concrete deadline via
+	// c.Deadline() in addition to the immediate Done() signal.
+	if cfg.RequestContext != nil {
+		app.shutdownCtx = cfg.RequestContext
+	}
+
+	// --- phase 4: report initial state -----------------------------------------
+	if cfg.OnShutdownStart != nil {
+		cfg.OnShutdownStart(int(atomic.LoadInt64(&app.activeConns)))
+	}
+
+	// --- phase 5: pre-shutdown hooks --------------------------------------------
+	preStart := time.Now()
 	app.hooks.executeOnPreShutdownHooks()
-	defer app.hooks.executeOnPostShutdownHooks(err)
+	tel.PreHooksDuration = time.Since(preStart)
+
+	// --- phase 5b: graceful close of typed (WS/SSE) connections ----------------
+	// Only connections explicitly marked as WebSocket or SSE are touched.
+	// Plain HTTP connections proceed to the normal drain / force-close path.
+	gcStart := time.Now()
+	if app.trackingListener != nil {
+		app.trackingListener.GracefulCloseTyped(ctx, &cfg, tel)
+	}
+	tel.GracefulCloseDuration = time.Since(gcStart)
+
+	// --- phase 6: start drain monitor ------------------------------------------
+	interval := cfg.DrainInterval
+	if interval <= 0 {
+		interval = 500 * time.Millisecond
+	}
+
+	drainDone := make(chan struct{})
+	startedAt := time.Now()
+
+	go func() {
+		ticker := time.NewTicker(interval)
+		defer ticker.Stop()
+		for {
+			select {
+			case <-drainDone:
+				return
+			case <-ticker.C:
+				if cfg.OnDrainProgress != nil {
+					cfg.OnDrainProgress(int(atomic.LoadInt64(&app.activeConns)), time.Since(startedAt))
+				}
+			}
+		}
+	}()
+
+	// Release the mutex before blocking so ActiveConnections / IsShuttingDown
+	// remain callable from other goroutines (e.g. health checks).
+	app.mutex.Unlock()


The ShutdownWithConfig function releases the app.mutex before the connection draining process is complete. This allows multiple concurrent calls to ShutdownWithConfig if not externally synchronized. Concurrent calls will result in multiple executions of pre-shutdown hooks, multiple drain monitor goroutines, and multiple executions of post-shutdown hooks. This violates the "exactly once" guarantee for hooks and can lead to unexpected behavior or crashes if hooks are not idempotent, potentially causing a Denial of Service during shutdown.

gemini-code-assist · 2026-01-28T09:19:03Z

+	if cfg.RequestContext != nil {
+		app.shutdownCtx = cfg.RequestContext
+	}


Swapping the application's shutdownCtx mid-shutdown in ShutdownWithConfig creates a data race with request handlers that access this field via DefaultCtx.Context(). Since shutdownCtx is a standard interface and not protected by synchronization primitives during these accesses, concurrent read/write operations can lead to "torn reads," resulting in application panics and Denial of Service.

gemini-code-assist · 2026-01-28T09:19:03Z

+	DrainedConns          int       `json:"drainedConns"`
+	ForcedConns           int       `json:"forcedConns"`
+	WebSocketsClosed      int       `json:"webSocketsClosed"`
+	SSEsClosed            int       `json:"sseClosed"`


The JSON tag sseClosed is inconsistent with the Go field name SSEsClosed and the pattern used for WebSocketsClosed (webSocketsClosed). For consistency, it would be better to change the JSON tag to ssesClosed.

Suggested change

SSEsClosed int `json:"sseClosed"`

SSEsClosed int `json:"ssesClosed"`

gemini-code-assist · 2026-01-28T09:19:03Z

+	return func(c Ctx) error {
+		status := "running"
+		if app.IsShuttingDown() {
+			if app.lastTelemetry.Load() != nil {
+				status = "shutdown"
+			} else {
+				status = "shutting_down"
+			}
+		}
+
+		resp := shutdownDebugResponse{
+			Status:            status,
+			ActiveConnections: app.ActiveConnections(),
+			LastShutdown:      telemetryToJSON(app.lastTelemetry.Load()),
+		}
+		return c.JSON(resp)
+	}


The app.lastTelemetry.Load() call is performed twice: once to determine the status and again to populate the response. This is slightly inefficient and could lead to a subtle race condition where the telemetry data changes between the two calls. It's better to load the value once into a local variable and reuse it.

return func(c Ctx) error { status := "running" lastTel := app.lastTelemetry.Load() if app.IsShuttingDown() { if lastTel != nil { status = "shutdown" } else { status = "shutting_down" } } resp := shutdownDebugResponse{ Status: status, ActiveConnections: app.ActiveConnections(), LastShutdown: telemetryToJSON(lastTel), } return c.JSON(resp) }

gemini-code-assist · 2026-01-28T09:19:03Z

+    "drainedConns": 12,
+    "forcedConns": 3,
+    "webSocketsClosed": 4,
+    "sseClosed": 1,


To align with the suggested code change for API consistency, please update the JSON key in this example from sseClosed to ssesClosed.

Suggested change

"sseClosed": 1,

"ssesClosed": 1,

ReneWerner87 · 2026-01-28T12:02:00Z

The code is too dependent on telemetry.
The core should be free of external tools and run with all tools.

Can you make the adjustment so that it requires few adjustments to the core and is generic, so that other external tools, plugins, etc. can use it?

gaby · 2026-01-28T12:28:26Z

@zishanjawed What issue is this related to? I'm confused as to why this PR was created without first creating an issue and discussing it with the team.

zishanjawed requested a review from a team as a code owner January 28, 2026 09:10

zishanjawed requested review from ReneWerner87, efectn, gaby and sixcolors January 28, 2026 09:10

ReneWerner87 added v3 ☢️ Bug ✏️ Feature 📒 Documentation 🧹 Updates labels Jan 28, 2026

ReneWerner87 added this to v3 Jan 28, 2026

ReneWerner87 added this to the v3 milestone Jan 28, 2026

coderabbitai Bot reviewed Jan 28, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jan 28, 2026

View reviewed changes

gaby closed this Jan 31, 2026

github-project-automation Bot moved this to Done in v3 Jan 31, 2026

ReneWerner87 removed ☢️ Bug 📒 Documentation ✏️ Feature 🧹 Updates v3 labels Feb 1, 2026

ReneWerner87 removed this from v3 Feb 1, 2026

ReneWerner87 removed this from the v3 milestone Feb 1, 2026

ReneWerner87 added the 🧹 Invalid label Feb 1, 2026

ReneWerner87 mentioned this pull request Feb 1, 2026

feat: tag invalidation, conditional requests, and Cache-Control for cache middleware #4041

Closed

8 tasks

	SSEsClosed int `json:"sseClosed"`
	SSEsClosed int `json:"ssesClosed"`

Uh oh!

Conversation

zishanjawed commented Jan 28, 2026

Summary

Tests (9 new)

Documentation (docs/guide/shutdown.md)

Race-detector note

Test plan

Uh oh!

coderabbitai Bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

ReneWerner87 commented Jan 28, 2026

Uh oh!

gaby commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Documentation (`docs/guide/shutdown.md`)

coderabbitai Bot commented Jan 28, 2026 •

edited

Loading