Skip to content

stale Colab workspace folders can remain after mixed server-removal batches #568

@dhunganapramod9

Description

@dhunganapramod9

Summary

I think there is a small bug in JupyterConnectionManager.revoke() when a removal batch contains a mix of:

  • endpoints that are currently connected, and
  • endpoints that were never connected locally

In that case, cleanup can stop too early, which means a connected Colab server may not be fully revoked. One visible result is that a stale colab: workspace folder can remain mounted even after the underlying Colab server assignment has already been removed.

What seems to be happening

Right now revoke() exits as soon as it hits an endpoint that does not have an active connection:

for (const endpoint of endpoints) {
  const promise = this.connections.get(endpoint);
  if (!promise) {
    return;
  }
  bestEffortDisposeConnection(promise);
  this.connections.delete(endpoint);
}
if (!silent) {
  this.revokeConnectionEmitter.fire(endpoints);
}

That early return looks harmless at first, but in a mixed batch it can cause two different problems.

Case 1

If an unconnected endpoint appears first, revoke() exits immediately and never reaches any later connected endpoints in the same batch.

So those later connections do not get disposed at all.

Case 2

If a connected endpoint appears first, it does get disposed. But if a later endpoint in the same batch is unconnected, the function still returns early before reaching the emitter.

In that case, onDidRevokeConnections never fires, so downstream cleanup does not happen. That seems to be why ContentsFileSystemProvider can leave the matching colab: workspace folder behind.

Why this seems realistic

This looks possible during assignment reconciliation, where multiple removed servers can be emitted together in one event.

JupyterConnectionManager only tracks servers that were actually opened or mounted on the local machine, so it is normal for some removed endpoints to have no active local connection.

Because the removal list is not ordered by connection state, a batch can naturally contain both connected and unconnected endpoints in any order.

Expected behavior

What I would expect here is:

  • unconnected endpoints are skipped
  • connected endpoints in the batch are still disposed
  • the revoke event fires only if at least one connection was actually revoked
  • the event payload includes only the endpoints that were actually revoked

Suggested fix

A simple fix would be to keep going instead of returning early.

In other words:

  1. collect the endpoints that were actually revoked
  2. skip endpoints that are missing from this.connections
  3. emit only the successfully revoked endpoints
  4. only fire the event when that list is non empty

Test coverage that would help

It would probably be worth adding unit tests for both mixed-batch orders:

  • unconnected endpoint before connected endpoint
  • connected endpoint before unconnected endpoint

That should cover both the missed-disposal path and the missed-event path.

Closing

If this matches the intended lifecycle behavior, I’d be happy to open a small PR with the revoke() change and a couple of focused unit tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions