Skip to content

feat(controller:diskless): enable immediate partition reassignment#537

Merged
giuseppelillo merged 2 commits intomainfrom
jeqo/pod-2122-enable-partition-reassign
Mar 16, 2026
Merged

feat(controller:diskless): enable immediate partition reassignment#537
giuseppelillo merged 2 commits intomainfrom
jeqo/pod-2122-enable-partition-reassign

Conversation

@jeqo
Copy link
Copy Markdown
Contributor

@jeqo jeqo commented Mar 16, 2026

For diskless topics, partition reassignment now completes immediately without the staged process (addingReplicas/removingReplicas). Since data is stored in object storage rather than local disk, there is nothing to sync between replicas.

Changes:

  • Skip setting targetRemoving/targetAdding for diskless topics
  • Use target replicas directly instead of merged replica list
  • Update tests to expect immediate completion (no ongoing reassignment)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables immediate partition reassignment for diskless topics in Kafka's controller. Since diskless topics store data in object storage rather than on local disks, replica syncing is unnecessary, and reassignment can complete in a single step rather than going through the staged addingReplicas/removingReplicas process.

Changes:

  • Skip the staged reassignment process for diskless topics by applying target replicas directly, with ISR containing only active brokers
  • Exclude diskless topics from preferred-leader imbalance tracking and periodic leader balancing
  • Add new tests covering fenced broker rejection, partial ISR with fenced brokers, and leader balancing skip behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java Core logic: immediate reassignment for diskless, skip imbalance tracking, add isDisklessTopic helper
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java Updated existing tests to expect immediate completion; added tests for fenced broker edge cases and leader balancing skip

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java Outdated
For diskless topics, partition reassignment now completes immediately
without the staged process (addingReplicas/removingReplicas). Since
data is stored in object storage rather than local disk, there is
nothing to sync between replicas.

Changes:
- Skip setting targetRemoving/targetAdding for diskless topics
- Use target replicas directly instead of merged replica list
- Update tests to expect immediate completion (no ongoing reassignment)
@jeqo jeqo force-pushed the jeqo/pod-2122-enable-partition-reassign branch from 270533b to 31a563a Compare March 16, 2026 13:02
@jeqo jeqo requested a review from Copilot March 16, 2026 13:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables immediate partition reassignment for diskless topics in the Kafka controller. Since diskless topics store data in object storage rather than local disk, there's no need for the staged replica sync process (addingReplicas/removingReplicas). The PR also excludes diskless topics from the controller's preferred leader balancing, as a metadata transformer handles leader routing for these topics.

Changes:

  • Skip staged reassignment for diskless topics: apply target replicas and ISR directly, rejecting reassignments where all target replicas are fenced
  • Exclude diskless topics from imbalancedPartitions tracking and periodic leader balancing
  • Extract isDisklessTopic() helper method and refactor existing inline checks to use it

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java Core logic: immediate diskless reassignment, leader balancing skip, isDisklessTopic helper, imbalanced partition exclusion
metadata/src/test/java/org/apache/kafka/controller/ReplicationControlManagerTest.java Updated existing tests for immediate completion; added tests for fenced broker rejection, partial ISR, and leader balancing skip

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jeqo jeqo marked this pull request as ready for review March 16, 2026 13:57
Copy link
Copy Markdown
Contributor

@viktorsomogyi viktorsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether these changes cause a momentary offline state for the reassigned partitions? For instance if we reassign a partition's replicas from [1,2,3] to [4,5,6] it was useful to take the union of the changes as we could ensure that the partition doesn't go offline.
Although you state that since everything is in object storage we actually don't need to sync which is true, but it may take some time to warm up the cache and deal with the cache misses for fetch requests. This may also cause some extra load on the brokers and object storage. So my questions are:

  • is it possible that partitions become offline in some cases during the reassignment and should we worry about it?
  • should we worry about the performance impact of cache misses during reassignment and somehow pre-warm the target set instead?

@jeqo
Copy link
Copy Markdown
Contributor Author

jeqo commented Mar 16, 2026

it may take some time to warm up the cache and deal with the cache misses for fetch requests. This may also cause some extra load on the brokers and object storage.

Yes, but currently there is no proactive cache warm-up. It happens on-demand as new requests are received.

is it possible that partitions become offline in some cases during the reassignment and should we worry about it?

Partitions won't go offline during reassignment. The controller rejects the reassignment if none of the target brokers are active (unfenced and not in controlled shutdown). As long as at least one target broker is active, it will be elected leader immediately and able to serve requests; even with empty cache.

should we worry about the performance impact of cache misses during reassignment and somehow pre-warm the target set instead?

Yes, there will be a transient increase in fetch latency and object storage reads until the new leader's cache is populated. This happens on-demand — there is no proactive cache warm-up today.
We can look into pre-warming the cache when working on the Diskless log implementation and assess then if we can play with metadata to signal cache readiness.

I've added some additional comments to clarify these points.

@jeqo jeqo requested a review from viktorsomogyi March 16, 2026 15:26
Copy link
Copy Markdown
Contributor

@viktorsomogyi viktorsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it's good to go then on my side.

@giuseppelillo giuseppelillo merged commit 1982f0e into main Mar 16, 2026
6 checks passed
@giuseppelillo giuseppelillo deleted the jeqo/pod-2122-enable-partition-reassign branch March 16, 2026 16:14
AnatolyPopov pushed a commit that referenced this pull request Mar 23, 2026
)

* feat(controller:diskless): enable immediate partition reassignment

For diskless topics, partition reassignment now completes immediately
without the staged process (addingReplicas/removingReplicas). Since
data is stored in object storage rather than local disk, there is
nothing to sync between replicas.

Changes:
- Skip setting targetRemoving/targetAdding for diskless topics
- Use target replicas directly instead of merged replica list
- Update tests to expect immediate completion (no ongoing reassignment)

(cherry picked from commit 1982f0e)

# Conflicts:
#	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
jeqo added a commit that referenced this pull request Mar 23, 2026
)

* feat(controller:diskless): enable immediate partition reassignment

For diskless topics, partition reassignment now completes immediately
without the staged process (addingReplicas/removingReplicas). Since
data is stored in object storage rather than local disk, there is
nothing to sync between replicas.

Changes:
- Skip setting targetRemoving/targetAdding for diskless topics
- Use target replicas directly instead of merged replica list
- Update tests to expect immediate completion (no ongoing reassignment)
jeqo added a commit that referenced this pull request Mar 23, 2026
)

* feat(controller:diskless): enable immediate partition reassignment

For diskless topics, partition reassignment now completes immediately
without the staged process (addingReplicas/removingReplicas). Since
data is stored in object storage rather than local disk, there is
nothing to sync between replicas.

Changes:
- Skip setting targetRemoving/targetAdding for diskless topics
- Use target replicas directly instead of merged replica list
- Update tests to expect immediate completion (no ongoing reassignment)

(cherry picked from commit 1982f0e)

# Conflicts:
#	metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants