Make mcp serve non-blocking under concurrent clients#52
Merged
Conversation
fixed: #51 The proxy used to serialize everything through a single ProxyServer mutex held across backend I/O, so one slow backend or dead client could freeze every other session. Under opencode it also leaked child processes (13 chroma-mcp, 4 roam-tui, etc) because shutdown timeouts would silently give up and Command wasn't reaped on drop. Rewrote the hot path around Arc<McpClient> and a three-phase dispatch that never holds the proxy lock across await: resolve routing under a brief lock, connect (if needed) with the lock released, then call_tool fully outside the lock. Transport trait is now &self + Sync so a single client instance is shared across concurrent tasks; StdioTransport runs a writer task + a reader task with a pending-id map, so multiple in-flight requests on the same backend multiplex through one pipe. Discovery moved out of the proxy lock entirely with its own discovery_lock to dedupe concurrent batches without blocking request handlers for already-discovered backends. Reaper runs in parallel via JoinSet, respects a warm-up grace so fresh backends aren't killed before first use, and force-kills via kill_on_drop when shutdown stalls. Every tokio Command has kill_on_drop(true) so orphans are impossible by construction. SSE legacy path bounds tx.send with a 5s timeout and evicts wedged sessions instead of pinning handlers forever; ping task uses try_send. HTTP listener binds with TCP keepalive 30s/10s so dead clients are detected in ~60s instead of 2h, and a per-request proxy timeout (MCP_PROXY_REQUEST_TIMEOUT, default 120s) is the final bound. Signed-off-by: Avelino <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR refactors the MCP proxy/server execution path to avoid holding global locks across backend I/O, enabling concurrent clients to make progress independently while also tightening bounds on hung requests and improving backend process cleanup.
Changes:
- Reworks the
TransportAPI to be&self + Send + Syncand updatesMcpClientto be shareable (Arc<dyn Transport>,AtomicU64ids). - Implements a multiplexed
StdioTransport(writer/reader tasks + pending-id map) to allow multiple in-flight requests on one backend process. - Refactors
mcp servedispatch/discovery/reaping for non-blocking concurrency (separate discovery lock, parallel shutdown, TCP keepalive, per-request proxy timeout) and updates docs/config references accordingly.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/transport/stdio.rs | Adds multiplexed stdio transport with concurrent in-flight request support. |
| src/transport/mod.rs | Updates Transport trait to Send + Sync with &self methods. |
| src/transport/http.rs | Moves mutable HTTP transport state behind mutexes for shared use. |
| src/transport/cli.rs | Makes CLI transport discovery/request handling concurrency-safe and adds kill_on_drop for tool invocations. |
| src/serve.rs | Major non-blocking proxy refactor: 3-phase dispatch, discovery lock, parallel reaper, keepalive listener, per-request timeout, updated health. |
| src/client.rs | Makes McpClient shareable and thread-safe with Arc transport + atomic ids. |
| src/main.rs | Adjusts CLI connection code for the new shared client API. |
| README.md | Updates proxy-mode feature claims and guarantees. |
| docs/reference/environment-variables.md | Documents MCP_PROXY_REQUEST_TIMEOUT. |
| docs/reference/architecture.md | Updates architecture/concurrency model documentation for new transport + proxy model. |
| docs/mcp-servers-are-draining-your-hardware.md | Adds April 2026 update describing multi-client orchestration. |
| docs/guides/proxy-mode.md | Documents the new concurrency model, health metrics, and reaping behavior. |
| Cargo.toml | Adds socket2 dependency for TCP keepalive listener binding. |
| Cargo.lock | Records socket2 addition (now multiple versions present). |
Comments suppressed due to low confidence (1)
src/transport/cli.rs:118
cli_discovery::discover_tools(...)ultimately runsCommand::output()under a timeout, butsrc/cli_discovery.rscurrently spawns thoseCommands withoutkill_on_drop(true). If the timeout fires (or the task is cancelled), those discovery children can still leak/orphan, which undermines the PR’s “no orphans/kill_on_drop everywhere” guarantee. Consider addingkill_on_drop(true)in the discovery helper(s) as well.
let results = cli_discovery::discover_tools(
&self.command,
&self.base_args,
&self.env,
&self.help_flag,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Avelino <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixed: #51
The proxy used to serialize everything through a single ProxyServer mutex held across backend I/O, so one slow backend or dead client could freeze every other session. Under opencode it also leaked child processes (13 chroma-mcp, 4 roam-tui, etc) because shutdown timeouts would silently give up and Command wasn't reaped on drop.
Rewrote the hot path around Arc and a three-phase dispatch that never holds the proxy lock across await: resolve routing under a brief lock, connect (if needed) with the lock released, then call_tool fully outside the lock. Transport trait is now &self + Sync so a single client instance is shared across concurrent tasks; StdioTransport runs a writer task + a reader task with a pending-id map, so multiple in-flight requests on the same backend multiplex through one pipe. Discovery moved out of the proxy lock entirely with its own discovery_lock to dedupe concurrent batches without blocking request handlers for already-discovered backends. Reaper runs in parallel via JoinSet, respects a warm-up grace so fresh backends aren't killed before first use, and force-kills via kill_on_drop when shutdown stalls. Every tokio Command has kill_on_drop(true) so orphans are impossible by construction. SSE legacy path bounds tx.send with a 5s timeout and evicts wedged sessions instead of pinning handlers forever; ping task uses try_send. HTTP listener binds with TCP keepalive 30s/10s so dead clients are detected in ~60s instead of 2h, and a per-request proxy timeout (MCP_PROXY_REQUEST_TIMEOUT, default 120s) is the final bound.