Skip to content

chore(inkless:sync): upstream Apache Kafka 2025-11-21 (before 4.3.0-SNAPSHOT)#499

Merged
giuseppelillo merged 758 commits intomainfrom
sync/upstream-20251121
Feb 6, 2026
Merged

chore(inkless:sync): upstream Apache Kafka 2025-11-21 (before 4.3.0-SNAPSHOT)#499
giuseppelillo merged 758 commits intomainfrom
sync/upstream-20251121

Conversation

@jeqo
Copy link
Copy Markdown
Contributor

@jeqo jeqo commented Feb 5, 2026

Summary

Sync inkless with Apache Kafka trunk at commit 8b119e5105 (2025-11-21, version 4.2.0-SNAPSHOT before 4.3.0 bump).

Key Changes

  • 32 merge conflicts resolved across config, core, and test files
  • Inkless version: 4.2.0-inkless-SNAPSHOT
  • Aligned with manual sync branch upstream-sync-before-4.3 for consistency

Conflict Resolution Highlights

  • Core files: BrokerServer, ControllerServer, DelayedFetch, KafkaApis, ReplicaManager - preserved inkless additions while adopting upstream API changes
  • Restored files: FlattenedIterator.java + test (removed upstream, needed for inkless ConcatenatedRecords)
  • Test fixes: NoOpRemoteLogMetadataManager config for InklessConfigsTest

Post-Sync Refinements

  • Aligned util.LinkedHashMap usage with upstream pattern (vs Seq)
  • PlaintextConsumerTest reduced to upstream tests only (Scala tests migrated to Java by upstream)

Follow-up Items

  • Java consumer tests (clients/clients-integration-tests/) need diskless topic support (documented in SESSION)

Commits

  • 60725ea3f7 merge: apache/kafka trunk 8b119e5
  • 0ea657a3c5 fix(sync): fix compilation errors after upstream merge
  • 42e9b136b8 fix(test): configure NoOpRemoteLogMetadataManager
  • d316a3c5d6 fix(sync): restore FlattenedIteratorTest
  • 92a1194131 refactor(sync): use util.LinkedHashMap to match upstream
  • df2ec2db53 refactor(sync): align core files with upstream-sync-before-4.3

Test plan

  • make build passes
  • make test passes
  • Inkless-specific tests pass (InklessClusterTest, InklessConfigsTest, etc.)
  • Compared with manual sync branch upstream-sync-before-4.3

dejan2609 and others added 30 commits October 7, 2025 18:20
List of changes:
- prerequisite Jira ticket:
  - [KAFKA-19591](https://issues.apache.org/jira/browse/KAFKA-19591)
- mandatory version upgrades:
  - Gradle version: 8.14.3 -->> 9.1.0
  - Gradle Shadow plugin: 8.3.6 -->> 8.3.9
  - Gradle dependencycheck plugin: 8.2.1 -->> 12.1.3
  - Gradle spotbugs plugin: 6.2.3 -->> 6.2.5
  - Gradle spotless plugin: 6.25.0 -->> 7.2.1
- build logic will be refactored to accommodate Gradle 9 breaking
changes
- (optional): a dozen of Gradle plugins versions will also be upgraded
- other JIRA tickets that had to be solved all along:
  - [KAFKA-16801](https://issues.apache.org/jira/browse/KAFKA-16801)
  - [KAFKA-19654](https://issues.apache.org/jira/browse/KAFKA-19654)

 **Related links:**
- https://gradle.org/whats-new/gradle-9
- https://github.com/gradle/gradle/releases/tag/v9.0.0
- https://docs.gradle.org/9.0.0/release-notes.html
- https://docs.gradle.org/9.0.0/userguide/upgrading_major_version_9.html
- https://docs.gradle.org/9.1.0/release-notes.html

Notes:
- new Gradle version brings up some breaking changes, as always 😃
- Kafka build with Gradle 9 has same issues as other projects:
  - redhat-developer/vscode-java#4018
  - gradle/gradle#32597

Reviewers: Chia-Ping Tsai <[email protected]>
Since that 4.0 branch does not include
[KAFKA-18748](https://issues.apache.org/jira/browse/KAFKA-18748), it is
unable to find the related scan reports, but the ci-complete workflow is
still being triggered on the 4.0 branch. Disable this on the 4.0 branch,
as its reports can be safely ignored.

See apache/kafka#20616 (comment).

Reviewers: Chia-Ping Tsai <[email protected]>
…#19955)

In Kafka Streams, configuration classes typically follow a fluent API
pattern to ensure a consistent and intuitive developer experience.
However, the current implementation of
`org.apache.kafka.streams.KafkaStreams$CloseOptions` deviates from this
convention by exposing a public constructor, breaking the uniformity
expected across the API.

To address this inconsistency, we propose introducing a new
`CloseOptions` class that adheres to the fluent API style, replacing the
existing implementation. The new class will retain the existing
`timeout(Duration)` and `leaveGroup(boolean)` methods but will enforce
fluent instantiation and configuration. Given the design shift, we will
not maintain backward compatibility with the current class.

This change aligns with the goal of standardizing configuration objects
across Kafka Streams, offering developers a more cohesive and
predictable API.

Reviewers: Bill Bejeck<[email protected]>
…equest (#20632)

Test the `StreamsGroupDescribeRequest` RPC and corresponding responses
for situations where
- `streams.version` not upgraded to 1
- `streams.version` enabled, multiple groups listening to the same
topic.

Reviewers: Lucas Brutschy <[email protected]>
…erver module (#20636)

It moves the `ReconfigurableQuorumIntegrationTest` class to the
`org.apache.kafka.server` package and consolidates two related tests,
`RemoveAndAddVoterWithValidClusterId` and
`RemoveAndAddVoterWithInconsistentClusterId`, into a single file. This
improves code organization and reduces redundancy.

Reviewers: Chia-Ping Tsai <[email protected]>
…ome issues with Gradle 9+ are solved) (#20652)

Extends from: #19513

Note: in Gradle 9+ we have to add a switch like this:
```
./gradlew dependencyUpdates --no-parallel
```
Related link:
https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.53.0

Reviewers: Chia-Ping Tsai <[email protected]>
This patch updates the Apache Kafka project's build, test, and
dependency configurations.

- Java Version Update: The build and test processes have been upgraded
from Java 24 to Java 25.
- Scala Version Update: The Scala library version has been updated from
2.13.16 to 2.13.17.
-  Dependency Upgrades: Several dependencies have been updated to newer
versions, including mockito (5.14.2 to 5.20.0), zinc (1.10.8 to 1.11.0),
and scala-library/reflect (2.13.16 to 2.13.17).
- Code and Configuration Changes: The patch modifies build.gradle to
exclude certain Spotbugs tasks for Java 25 compatibility. It also
changes the default signing algorithm in TestSslUtils.java from
SHA1withRSA to SHA256withRSA, enhancing security.
- Documentation: The README.md file has been updated to reflect the new
Java 25 requirement.

Reviewers: Ismael Juma <[email protected]>, Liam Clarke-Hutchinson
 <[email protected]>, Gaurav Narula <[email protected]>,
 Chia-Ping Tsai <[email protected]>
Updates all GitHub Actions to their latest versions.

----
**Upgraded Actions:**

* **Gradle setup**:
     * `gradle/actions/setup-gradle` **v4.4.4 → v5.0.0**
* **Trivy security scanner**:
     * `aquasecurity/trivy-action` **v0.24.0 → v0.33.1**
* **Docker build tools:**

    * `docker/setup-qemu-action` **v3.2.0 → v3.6.0**
    * `docker/setup-buildx-action` **v3.6.1 → v3.11.1**
    * `docker/login-action` **v3.3.0 → v3.6.0**
* **GitHub utilities:**

    * `actions/github-script` **v7 → v8**

    * `actions/stale` **v9 → v10**

Reviewers: Chia-Ping Tsai <[email protected]>
…0399)

the constructor is error-prone when migrating code, since metrics could
get unintentionally changed. We should remove the constructor and use
constant strings instead to avoid issues like KAFKA-17876, KAFKA-19150,
and others.

Reviewers: Ken Huang <[email protected]>, Jhen-Yung Hsu
 <[email protected]>, KuoChe <[email protected]>, Chia-Ping
 Tsai <[email protected]>
Clear pendingTasksToInit on tasks clear.  It matters in situations when
we shutting down a thread in PARTITIONS_ASSIGNED state. In this case we
may have locked some unassigned task directories (see
TaskManager#tryToLockAllNonEmptyTaskDirectories). Then we may have
gotten assigned to one or multiple of those tasks. In this scenario,  we
will not release the locks for the unassigned task directories (see
TaskManager#releaseLockedUnassignedTaskDirectories), because
TaskManager#allTasks includes pendingTasksToInit, but it hasn't been
cleared.

Reviewers: Matthias J. Sax <[email protected]>, Lucas Brutschy
 <[email protected]>
from: apache/kafka#20637 (comment)

Previously, the method used LOGGER.info() instead of LOGGER.trace().
This patch corrects the logging level used in the trace method of
StateChangeLogger.

Reviewers: Manikumar Reddy <[email protected]>, TengYao Chi
 <[email protected]>, Ken Huang <[email protected]>, Chia-Ping Tsai
 <[email protected]>

Co-authored-by: Ubuntu <[email protected]>
Fix the incorrect package name in metrics and revert the comment

see:   apache/kafka#20399 (comment)
apache/kafka#20399 (comment)

Reviewers: Ken Huang <[email protected]>, Manikumar Reddy
 <[email protected]>, Chia-Ping Tsai <[email protected]>
Stores the existing values for both the fields in a local variable for
logging.

Reviewers: Omnia Ibrahim <[email protected]>
…he flakiness of the test. (#20664)

MINOR: changed the condition to only check the test topic to reduce the
flakiness of the test.

Reviewers: Matthias J. Sax <[email protected]>, Lianet Magrans
 <[email protected]>
…ling (#20661)

When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to `Integer.MAX_VALUE`
effectively turning telemetry off.  This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.

Reviewers: Apoorv Mittal <[email protected]>, Matthias
 Sax<[email protected]>
…rTest [2/N] (#20544)

Changes made
- Additional `setUpTaskManager()` overloaded method -- Created this
temporarily to pass the CI pipelines so that I can work on the failing
tests incrementally
- Rewrote 3 tests to use stateUpdater thread

Reviewers: Lucas Brutschy <[email protected]>
The auto-topic creation in` AutoTopicCreationManager` currently retries
creating internal topics with every heartbeat.

A simple back-off mechanism was implemented: if there is error in the
errorcache and it's not expired or it's already in the inflight topics,
then not send the topic creation request.

Unit tests are added as well.

Reviewers: Lucas Brutschy <[email protected]>
We missed a branch in #20671.

This PR handles the else branch where we log about skipping the follower
state change.

Also updated the doc for the method as it was out of date.

Reviewers: Chia-Ping Tsai <[email protected]>
The comment stating "Additionally adds a license header to the wrapper
while editing the file contents" is no longer accurate. This
functionality was removed in PR #7742 (commit 45842a3) since Gradle
5.6+ automatically handles license headers in wrapper files.

Reviewers: Chia-Ping Tsai <[email protected]>
…ded with a `--no-parallel` switch (#20683)

Prologue:
apache/kafka#19513 (comment)

Also related to: #20652

@chia7712 I double-checked custom gradle commands in `README.md` and
AFAIK we should be fine now.

Reviewers: Chia-Ping Tsai <[email protected]>
…-->> 9 (#20684)

from: apache/kafka#19513 (comment)

1. Fix the task `unitTest` and `integrationTest`;
2. Change the `name` to a method call `name()` for `KeepAliveMode`.

Reviewers: Ken Huang <[email protected]>, dejan2609
 <[email protected]>, Chia-Ping Tsai <[email protected]>
Clarify the Javadoc for `Node#isFenced` to align with KIP-1073, which
introduced the “fenced” field in `DescribeCluster` for brokers. The
“fenced” flag applies only to broker nodes returned by
`DescribeCluster`. For controller quorum nodes, it doesn’t apply and is
always `false`. This clarifies that not every node has a meaningful
fenced state.

Reviewers: TaiJuWu <[email protected]>, Chia-Ping Tsai
 <[email protected]>
This PR is a follow-up to KAFKA-18193. It addresses the need for a null
check and an improved error message. Please refer to the previous
comments and the review of
apache/kafka#19955 (review)
for more context.

Reviewers: Chia-Ping Tsai <[email protected]>
There are some calls to `TestUtils::waitForCondition` with the actual
value of the condition being evaluated. These calls must use the
overload with a `Supplier` for conditionDetails so that the actual value
is lazily evaluated at the time of the condition failing.

Reviewers: Chia-Ping Tsai <[email protected]>
…eMode (#20695)

In Scala/Java joint compilation, builds that run with
`keepAliveMode=SESSION` can fail in the `:core:compileScala` task. This
happens when `javac` is given a non-existent Java output directory on
its classpath.

```
Unexpected javac output: warning: [path] bad path element
".../core/build/classes/java/main": no such file or directory
error: warnings found and -Werror specified
```

This issue occurs after a clean build because
`core/build/classes/java/main` does not exist, as Java classes are
written to `classes/scala/main` during joint compilation. With `-Werror`
enabled, the resulting `[path]` warning is treated as an error and stops
the build.

This PR adds a workaround scoped to **SESSION** builds, while continuing
to investigate the underlying classpath assembly logic during joint
compilation.

Related discussion:
apache/kafka#20684 (comment)
Tracked by:
[KAFKA-19786](https://issues.apache.org/jira/browse/KAFKA-19786)

Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai
 <[email protected]>
This PR improves error handling for feature argument parsing in the
`StorageTool`. The main change is stricter validation of the expected
format for feature arguments, ensuring users provide them in the correct
`feature=version` format.

**Before:**
* If user using argument like: `./bin/kafka-storage.sh
feature-dependencies --feature group.version`, the following error will
occur, which is a bit confusing:
<img width="970" height="162" alt="image"

src="https://github.com/user-attachments/assets/aa4341f9-6eb8-488e-b88e-f5244560184b"
/>

**After:**
<img width="812" height="55" alt="image"

src="https://github.com/user-attachments/assets/9aedecdb-e657-4bd3-8b9b-22d6282a1dc1"
/>

Reviewers: Chia-Ping Tsai <[email protected]>
…ucer (KIP-1147) (#20673)

*What*
https://issues.apache.org/jira/browse/KAFKA-19725

- Implement KIP-1147 for ConsoleProducer where we replace --property
with --reader-property.
- Currently the previous names for the options are still usable but
there will be warning message stating those are deprecated and will be
removed in a future version.
- I have added unit tests and also manually verified using the console
tools that things are working as expected.

Reviewers: Andrew Schofield <[email protected]>, Jhen-Yung Hsu
 <[email protected]>
…(#20692)

Cleanup and rewrote more tests in `TaskManagerTest.java`

Reviewers: Lucas Brutschy <[email protected]>
…LE (#20600)

Streams groups sometimes describe as NOT_READY when STABLE. That is, the
group is configured and all topics exist, but when you use LIST_GROUP
and STREAMS_GROUP_DESCRIBE, the group will show up as not ready.

The root cause seems to be that
apache/kafka#19802 moved the creation of the
soft state configured topology from the replay path to the heartbeat.
This way, LIST_GROUP and STREAMS_GROUP_DESCRIBE, may not show the
configured topology, because the configured topology that is created in
the heartbeat is "thrown away", and the new group is recreated on the
replay-path.

To reflect a consistent view of the topology via LIST_GROUP and
STREAMS_GROUP_DESCRIBE, we need to store additional information in the
consumer offset topic. In particular, we need to store at least whether
a topology was validated against the current topic metadata, as this
defines whether a group is in STABLE and not in NOT_READY.

This change adds a new field `validatedTopologyEpoch` to the metadata of
the group, which stores precisely this information.

Reviewers: Matthias J. Sax <[email protected]>
Make some internal methods `public` for third-party users who want to
hook into this API directly (even if they are not advised to do this;
using internal API is at their own risk).

Reviewers: Matthias J. Sax <[email protected]>
dajac and others added 24 commits November 20, 2025 13:41
This patch marks the OffsetCommit and OffsetFetch APIs as stable.

Reviewers: Lianet Magrans <[email protected]>
Rewrote more tests in `TaskManagerTest.java`

Reviewers: Lucas Brutschy <[email protected]>
* The field `AcquisitionLockTimeoutMs` in
`ShareAcknowledgeResponse.json` should be marked `ignorable` for
compatibility with older client versions.

Reviewers: Andrew Schofield <[email protected]>, Shivsundar R
 <[email protected]>
This PR is part of

[KIP-1226](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1226%3A+Introducing+Share+Partition+Lag+Persistence+and+Retrieval).

This PR adds integration tests to ShareConsumerTest.java file to verify
the share partition lag is reported successfully in various scenarios.

Reviewers: Andrew Schofield <[email protected]>
We only updated docs version in `4.1` branch, but not in `trunk`.

Cf
apache/kafka@abeebb3

Reviewers: Matthias J. Sax <[email protected]>
Sorted the names of the packages for which javadoc is generated. Removed
`org/apache/kafka/common/security/oauthbearer/secured` which no longer
exists.

Reviewers: Lianet Magrans <[email protected]>
Updating the docs for the client state metric (KIP-1221).

Reviewers: Matthias Sax<[email protected]>, Bill
 Bejeck<[email protected]>, Chia-Ping Tsai <[email protected]>
*What*
PR fixes some typos/nit in `ShareConsumerTest` and fixes a test
assertion in `ConsumerNetworkThreadTest`.

Reviewers: Andrew Schofield <[email protected]>, Chia-Ping Tsai
 <[email protected]>
For records with a delivery count exceeding 2, there is suspicion that
delivery failures   stem from underlying issues rather than natural
retry scenarios. The batching of such   records should be reduced.

Solution:  Determining which offset is bad is not possible at broker's
end. But broker can restrict the acquired records to a subset so only
bad record is skipped. We can do the following:

- If delivery count of a batch is >= 3 then only acquire 1/2 of the
batch records i.e for a batch of 0-499 (500 records) if batch delivery
count is 3 then start offset tracking and acquire 0-249 (250 records)
- If delivery count is again bumped then keeping acquring 1/2 of
previously acquired offsets until last delivery attempt i.e. 0-124 (125
records)
- For last delivery attempt, acquire only 1 offset. Then only the bad
record will be skipped.

Reviewers: Apoorv Mittal <[email protected]>, Andrew Schofield
 <[email protected]>, Abhinav Dixit <[email protected]>

---------

Co-authored-by: d00791190 <[email protected]>
…0942)

Changing log level from ERROR to WARN, as it's expected that this can
happen, and we should not incorrectly make users worry about it.

Reviewers: PoAn Yang <[email protected]>, Colt McNealy
 <[email protected]>, Lucas Brutschy <[email protected]>,
 Chia-Ping Tsai <[email protected]>
…(#20937)

jacksonDatabindYaml does not exist, it should be jacksonDataformatYaml.
I was a bit confused when I first saw the mention, I imagine others
might have the same.  Also moved the entry in the depencencies one row
down, so the order of the dependencies is more alphabetical.

Reviewers: Chia-Ping Tsai <[email protected]>
Sync inkless with upstream Apache Kafka trunk at commit
8b119e5 (2025-11-21), the last
commit before the version bump to 4.3.0-SNAPSHOT.

Version: 4.2.0-inkless-SNAPSHOT

Key merge decisions:
- Config/Version: Updated to 4.2.0-inkless-SNAPSHOT, scala 2.13.17
- Core files: Preserved inkless diskless fetch handling, API changes
- CI: Java 25 (jooq classes now in source)
- Deleted by upstream: 3 test files removed

See .sync/SESSION.md for detailed resolution documentation.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Main source fixes:
- Upgrade pitest plugin from 1.15.0 to 1.19.0-rc.3 (Gradle 9.2.1 compat)
- Restore FlattenedIterator.java with INKLESS NOTE (removed upstream)
- Update KafkaMetricsGroup constructor calls (Class -> String, String)
- Fix ValidationResult accessor methods (Java record)
- Update ReplicaManager remote fetch handling for new plural API
- Fix FetchPartitionStatus instantiation (Java record)
- Fix KafkaApis Set conversion for authorizedPartitions

Test source fixes:
- Add missing imports to PlaintextConsumerTest.scala
- Fix QuotaType package import
- Fix FetchPartitionStatus instantiation in DelayedFetchTest
- Update LogReadResult constructor calls with new parameters
- Remove validateWithMetadataVersion test (API removed upstream)
- Remove unused imports and methods

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Rename SESSION.md to SESSION-2025-11-21.md for date-based tracking
- Document all main source compilation fixes (11 items)
- Document all test source compilation fixes (14 items)
- Document restored FlattenedIterator.java with INKLESS NOTE
- Update phase status and next steps

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- All inkless-specific tests pass after clean build
- Document known pre-existing issue with InklessConfigsTest
- Document transient ClassCastException resolved by clean build

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test was crashing because TopicBasedRemoteLogMetadataManager (the
default RLMM) requires bootstrap.servers to create internal topics.
Configure NoOpRemoteLogMetadataManager to avoid this requirement, which
is the standard pattern used by other tests in the codebase.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- All inkless tests now pass
- Document InklessConfigsTest fix (NoOpRemoteLogMetadataManager)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Phase 5 verification checklist complete:
- make build passes
- make test passes
- Inkless module builds
- Key inkless files exist
- Version: 4.2.0-inkless-SNAPSHOT

Co-Authored-By: Claude Opus 4.5 <[email protected]>
When restoring a class removed by upstream for inkless functionality,
the associated tests should also be restored to maintain test coverage.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Change DelayedFetch from Seq to util.LinkedHashMap for classicFetchPartitionStatus
and disklessFetchPartitionStatus to align with upstream pattern and reduce future
merge conflicts.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Align DelayedFetch.scala indentation and style with upstream
- Align KafkaApis.scala with upstream (mutable.Set vs util.HashSet)
- Align ReplicaManager.scala with upstream structure
- Align test files with upstream versions
- PlaintextConsumerTest.scala now contains only upstream tests
  (inkless tests migrated to Java in clients-integration-tests)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
PlaintextConsumerTest was migrated from Scala to Java by upstream.
The Java tests don't yet cover diskless topics - documented as follow-up.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The .sync directory contains sync session documentation which doesn't
need Apache license headers.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Contributor

@giuseppelillo giuseppelillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeqo jeqo marked this pull request as ready for review February 6, 2026 10:08
@jeqo jeqo changed the title sync(inkless): upstream Apache Kafka 2025-11-21 (before 4.3.0-SNAPSHOT) chore(inkless:sync): upstream Apache Kafka 2025-11-21 (before 4.3.0-SNAPSHOT) Feb 6, 2026
@giuseppelillo giuseppelillo merged commit 72ff1c8 into main Feb 6, 2026
7 checks passed
@giuseppelillo giuseppelillo deleted the sync/upstream-20251121 branch February 6, 2026 16:51
@jeqo jeqo mentioned this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.