Skip to content

Upstream sync before 4.3#491

Closed
jeqo wants to merge 758 commits intomainfrom
upstream-sync-before-4.3
Closed

Upstream sync before 4.3#491
jeqo wants to merge 758 commits intomainfrom
upstream-sync-before-4.3

Conversation

@jeqo
Copy link
Copy Markdown
Contributor

@jeqo jeqo commented Jan 22, 2026

No description provided.

dejan2609 and others added 30 commits October 7, 2025 18:20
List of changes:
- prerequisite Jira ticket:
  - [KAFKA-19591](https://issues.apache.org/jira/browse/KAFKA-19591)
- mandatory version upgrades:
  - Gradle version: 8.14.3 -->> 9.1.0
  - Gradle Shadow plugin: 8.3.6 -->> 8.3.9
  - Gradle dependencycheck plugin: 8.2.1 -->> 12.1.3
  - Gradle spotbugs plugin: 6.2.3 -->> 6.2.5
  - Gradle spotless plugin: 6.25.0 -->> 7.2.1
- build logic will be refactored to accommodate Gradle 9 breaking
changes
- (optional): a dozen of Gradle plugins versions will also be upgraded
- other JIRA tickets that had to be solved all along:
  - [KAFKA-16801](https://issues.apache.org/jira/browse/KAFKA-16801)
  - [KAFKA-19654](https://issues.apache.org/jira/browse/KAFKA-19654)

 **Related links:**
- https://gradle.org/whats-new/gradle-9
- https://github.com/gradle/gradle/releases/tag/v9.0.0
- https://docs.gradle.org/9.0.0/release-notes.html
- https://docs.gradle.org/9.0.0/userguide/upgrading_major_version_9.html
- https://docs.gradle.org/9.1.0/release-notes.html

Notes:
- new Gradle version brings up some breaking changes, as always 😃
- Kafka build with Gradle 9 has same issues as other projects:
  - redhat-developer/vscode-java#4018
  - gradle/gradle#32597

Reviewers: Chia-Ping Tsai <[email protected]>
Since that 4.0 branch does not include
[KAFKA-18748](https://issues.apache.org/jira/browse/KAFKA-18748), it is
unable to find the related scan reports, but the ci-complete workflow is
still being triggered on the 4.0 branch. Disable this on the 4.0 branch,
as its reports can be safely ignored.

See apache/kafka#20616 (comment).

Reviewers: Chia-Ping Tsai <[email protected]>
…#19955)

In Kafka Streams, configuration classes typically follow a fluent API
pattern to ensure a consistent and intuitive developer experience.
However, the current implementation of
`org.apache.kafka.streams.KafkaStreams$CloseOptions` deviates from this
convention by exposing a public constructor, breaking the uniformity
expected across the API.

To address this inconsistency, we propose introducing a new
`CloseOptions` class that adheres to the fluent API style, replacing the
existing implementation. The new class will retain the existing
`timeout(Duration)` and `leaveGroup(boolean)` methods but will enforce
fluent instantiation and configuration. Given the design shift, we will
not maintain backward compatibility with the current class.

This change aligns with the goal of standardizing configuration objects
across Kafka Streams, offering developers a more cohesive and
predictable API.

Reviewers: Bill Bejeck<[email protected]>
…equest (#20632)

Test the `StreamsGroupDescribeRequest` RPC and corresponding responses
for situations where
- `streams.version` not upgraded to 1
- `streams.version` enabled, multiple groups listening to the same
topic.

Reviewers: Lucas Brutschy <[email protected]>
…erver module (#20636)

It moves the `ReconfigurableQuorumIntegrationTest` class to the
`org.apache.kafka.server` package and consolidates two related tests,
`RemoveAndAddVoterWithValidClusterId` and
`RemoveAndAddVoterWithInconsistentClusterId`, into a single file. This
improves code organization and reduces redundancy.

Reviewers: Chia-Ping Tsai <[email protected]>
…ome issues with Gradle 9+ are solved) (#20652)

Extends from: #19513

Note: in Gradle 9+ we have to add a switch like this:
```
./gradlew dependencyUpdates --no-parallel
```
Related link:
https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.53.0

Reviewers: Chia-Ping Tsai <[email protected]>
This patch updates the Apache Kafka project's build, test, and
dependency configurations.

- Java Version Update: The build and test processes have been upgraded
from Java 24 to Java 25.
- Scala Version Update: The Scala library version has been updated from
2.13.16 to 2.13.17.
-  Dependency Upgrades: Several dependencies have been updated to newer
versions, including mockito (5.14.2 to 5.20.0), zinc (1.10.8 to 1.11.0),
and scala-library/reflect (2.13.16 to 2.13.17).
- Code and Configuration Changes: The patch modifies build.gradle to
exclude certain Spotbugs tasks for Java 25 compatibility. It also
changes the default signing algorithm in TestSslUtils.java from
SHA1withRSA to SHA256withRSA, enhancing security.
- Documentation: The README.md file has been updated to reflect the new
Java 25 requirement.

Reviewers: Ismael Juma <[email protected]>, Liam Clarke-Hutchinson
 <[email protected]>, Gaurav Narula <[email protected]>,
 Chia-Ping Tsai <[email protected]>
Updates all GitHub Actions to their latest versions.

----
**Upgraded Actions:**

* **Gradle setup**:
     * `gradle/actions/setup-gradle` **v4.4.4 → v5.0.0**
* **Trivy security scanner**:
     * `aquasecurity/trivy-action` **v0.24.0 → v0.33.1**
* **Docker build tools:**

    * `docker/setup-qemu-action` **v3.2.0 → v3.6.0**
    * `docker/setup-buildx-action` **v3.6.1 → v3.11.1**
    * `docker/login-action` **v3.3.0 → v3.6.0**
* **GitHub utilities:**

    * `actions/github-script` **v7 → v8**

    * `actions/stale` **v9 → v10**

Reviewers: Chia-Ping Tsai <[email protected]>
…0399)

the constructor is error-prone when migrating code, since metrics could
get unintentionally changed. We should remove the constructor and use
constant strings instead to avoid issues like KAFKA-17876, KAFKA-19150,
and others.

Reviewers: Ken Huang <[email protected]>, Jhen-Yung Hsu
 <[email protected]>, KuoChe <[email protected]>, Chia-Ping
 Tsai <[email protected]>
Clear pendingTasksToInit on tasks clear.  It matters in situations when
we shutting down a thread in PARTITIONS_ASSIGNED state. In this case we
may have locked some unassigned task directories (see
TaskManager#tryToLockAllNonEmptyTaskDirectories). Then we may have
gotten assigned to one or multiple of those tasks. In this scenario,  we
will not release the locks for the unassigned task directories (see
TaskManager#releaseLockedUnassignedTaskDirectories), because
TaskManager#allTasks includes pendingTasksToInit, but it hasn't been
cleared.

Reviewers: Matthias J. Sax <[email protected]>, Lucas Brutschy
 <[email protected]>
from: apache/kafka#20637 (comment)

Previously, the method used LOGGER.info() instead of LOGGER.trace().
This patch corrects the logging level used in the trace method of
StateChangeLogger.

Reviewers: Manikumar Reddy <[email protected]>, TengYao Chi
 <[email protected]>, Ken Huang <[email protected]>, Chia-Ping Tsai
 <[email protected]>

Co-authored-by: Ubuntu <[email protected]>
Fix the incorrect package name in metrics and revert the comment

see:   apache/kafka#20399 (comment)
apache/kafka#20399 (comment)

Reviewers: Ken Huang <[email protected]>, Manikumar Reddy
 <[email protected]>, Chia-Ping Tsai <[email protected]>
Stores the existing values for both the fields in a local variable for
logging.

Reviewers: Omnia Ibrahim <[email protected]>
…he flakiness of the test. (#20664)

MINOR: changed the condition to only check the test topic to reduce the
flakiness of the test.

Reviewers: Matthias J. Sax <[email protected]>, Lianet Magrans
 <[email protected]>
…ling (#20661)

When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to `Integer.MAX_VALUE`
effectively turning telemetry off.  This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.

Reviewers: Apoorv Mittal <[email protected]>, Matthias
 Sax<[email protected]>
…rTest [2/N] (#20544)

Changes made
- Additional `setUpTaskManager()` overloaded method -- Created this
temporarily to pass the CI pipelines so that I can work on the failing
tests incrementally
- Rewrote 3 tests to use stateUpdater thread

Reviewers: Lucas Brutschy <[email protected]>
The auto-topic creation in` AutoTopicCreationManager` currently retries
creating internal topics with every heartbeat.

A simple back-off mechanism was implemented: if there is error in the
errorcache and it's not expired or it's already in the inflight topics,
then not send the topic creation request.

Unit tests are added as well.

Reviewers: Lucas Brutschy <[email protected]>
We missed a branch in #20671.

This PR handles the else branch where we log about skipping the follower
state change.

Also updated the doc for the method as it was out of date.

Reviewers: Chia-Ping Tsai <[email protected]>
The comment stating "Additionally adds a license header to the wrapper
while editing the file contents" is no longer accurate. This
functionality was removed in PR #7742 (commit 45842a3) since Gradle
5.6+ automatically handles license headers in wrapper files.

Reviewers: Chia-Ping Tsai <[email protected]>
…ded with a `--no-parallel` switch (#20683)

Prologue:
apache/kafka#19513 (comment)

Also related to: #20652

@chia7712 I double-checked custom gradle commands in `README.md` and
AFAIK we should be fine now.

Reviewers: Chia-Ping Tsai <[email protected]>
…-->> 9 (#20684)

from: apache/kafka#19513 (comment)

1. Fix the task `unitTest` and `integrationTest`;
2. Change the `name` to a method call `name()` for `KeepAliveMode`.

Reviewers: Ken Huang <[email protected]>, dejan2609
 <[email protected]>, Chia-Ping Tsai <[email protected]>
Clarify the Javadoc for `Node#isFenced` to align with KIP-1073, which
introduced the “fenced” field in `DescribeCluster` for brokers. The
“fenced” flag applies only to broker nodes returned by
`DescribeCluster`. For controller quorum nodes, it doesn’t apply and is
always `false`. This clarifies that not every node has a meaningful
fenced state.

Reviewers: TaiJuWu <[email protected]>, Chia-Ping Tsai
 <[email protected]>
This PR is a follow-up to KAFKA-18193. It addresses the need for a null
check and an improved error message. Please refer to the previous
comments and the review of
apache/kafka#19955 (review)
for more context.

Reviewers: Chia-Ping Tsai <[email protected]>
There are some calls to `TestUtils::waitForCondition` with the actual
value of the condition being evaluated. These calls must use the
overload with a `Supplier` for conditionDetails so that the actual value
is lazily evaluated at the time of the condition failing.

Reviewers: Chia-Ping Tsai <[email protected]>
…eMode (#20695)

In Scala/Java joint compilation, builds that run with
`keepAliveMode=SESSION` can fail in the `:core:compileScala` task. This
happens when `javac` is given a non-existent Java output directory on
its classpath.

```
Unexpected javac output: warning: [path] bad path element
".../core/build/classes/java/main": no such file or directory
error: warnings found and -Werror specified
```

This issue occurs after a clean build because
`core/build/classes/java/main` does not exist, as Java classes are
written to `classes/scala/main` during joint compilation. With `-Werror`
enabled, the resulting `[path]` warning is treated as an error and stops
the build.

This PR adds a workaround scoped to **SESSION** builds, while continuing
to investigate the underlying classpath assembly logic during joint
compilation.

Related discussion:
apache/kafka#20684 (comment)
Tracked by:
[KAFKA-19786](https://issues.apache.org/jira/browse/KAFKA-19786)

Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai
 <[email protected]>
This PR improves error handling for feature argument parsing in the
`StorageTool`. The main change is stricter validation of the expected
format for feature arguments, ensuring users provide them in the correct
`feature=version` format.

**Before:**
* If user using argument like: `./bin/kafka-storage.sh
feature-dependencies --feature group.version`, the following error will
occur, which is a bit confusing:
<img width="970" height="162" alt="image"

src="https://github.com/user-attachments/assets/aa4341f9-6eb8-488e-b88e-f5244560184b"
/>

**After:**
<img width="812" height="55" alt="image"

src="https://github.com/user-attachments/assets/9aedecdb-e657-4bd3-8b9b-22d6282a1dc1"
/>

Reviewers: Chia-Ping Tsai <[email protected]>
…ucer (KIP-1147) (#20673)

*What*
https://issues.apache.org/jira/browse/KAFKA-19725

- Implement KIP-1147 for ConsoleProducer where we replace --property
with --reader-property.
- Currently the previous names for the options are still usable but
there will be warning message stating those are deprecated and will be
removed in a future version.
- I have added unit tests and also manually verified using the console
tools that things are working as expected.

Reviewers: Andrew Schofield <[email protected]>, Jhen-Yung Hsu
 <[email protected]>
…(#20692)

Cleanup and rewrote more tests in `TaskManagerTest.java`

Reviewers: Lucas Brutschy <[email protected]>
…LE (#20600)

Streams groups sometimes describe as NOT_READY when STABLE. That is, the
group is configured and all topics exist, but when you use LIST_GROUP
and STREAMS_GROUP_DESCRIBE, the group will show up as not ready.

The root cause seems to be that
apache/kafka#19802 moved the creation of the
soft state configured topology from the replay path to the heartbeat.
This way, LIST_GROUP and STREAMS_GROUP_DESCRIBE, may not show the
configured topology, because the configured topology that is created in
the heartbeat is "thrown away", and the new group is recreated on the
replay-path.

To reflect a consistent view of the topology via LIST_GROUP and
STREAMS_GROUP_DESCRIBE, we need to store additional information in the
consumer offset topic. In particular, we need to store at least whether
a topology was validated against the current topic metadata, as this
defines whether a group is in STABLE and not in NOT_READY.

This change adds a new field `validatedTopologyEpoch` to the metadata of
the group, which stores precisely this information.

Reviewers: Matthias J. Sax <[email protected]>
Make some internal methods `public` for third-party users who want to
hook into this API directly (even if they are not advised to do this;
using internal API is at their own risk).

Reviewers: Matthias J. Sax <[email protected]>
jim0987795064 and others added 15 commits November 20, 2025 20:16
This PR bumps the versions of docker/setup-qemu-action from 3.6 to 3.7.

Reviewers: Chia-Ping Tsai <[email protected]>
This patch marks the OffsetCommit and OffsetFetch APIs as stable.

Reviewers: Lianet Magrans <[email protected]>
Rewrote more tests in `TaskManagerTest.java`

Reviewers: Lucas Brutschy <[email protected]>
* The field `AcquisitionLockTimeoutMs` in
`ShareAcknowledgeResponse.json` should be marked `ignorable` for
compatibility with older client versions.

Reviewers: Andrew Schofield <[email protected]>, Shivsundar R
 <[email protected]>
This PR is part of

[KIP-1226](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1226%3A+Introducing+Share+Partition+Lag+Persistence+and+Retrieval).

This PR adds integration tests to ShareConsumerTest.java file to verify
the share partition lag is reported successfully in various scenarios.

Reviewers: Andrew Schofield <[email protected]>
We only updated docs version in `4.1` branch, but not in `trunk`.

Cf
apache/kafka@abeebb3

Reviewers: Matthias J. Sax <[email protected]>
Sorted the names of the packages for which javadoc is generated. Removed
`org/apache/kafka/common/security/oauthbearer/secured` which no longer
exists.

Reviewers: Lianet Magrans <[email protected]>
Updating the docs for the client state metric (KIP-1221).

Reviewers: Matthias Sax<[email protected]>, Bill
 Bejeck<[email protected]>, Chia-Ping Tsai <[email protected]>
*What*
PR fixes some typos/nit in `ShareConsumerTest` and fixes a test
assertion in `ConsumerNetworkThreadTest`.

Reviewers: Andrew Schofield <[email protected]>, Chia-Ping Tsai
 <[email protected]>
For records with a delivery count exceeding 2, there is suspicion that
delivery failures   stem from underlying issues rather than natural
retry scenarios. The batching of such   records should be reduced.

Solution:  Determining which offset is bad is not possible at broker's
end. But broker can restrict the acquired records to a subset so only
bad record is skipped. We can do the following:

- If delivery count of a batch is >= 3 then only acquire 1/2 of the
batch records i.e for a batch of 0-499 (500 records) if batch delivery
count is 3 then start offset tracking and acquire 0-249 (250 records)
- If delivery count is again bumped then keeping acquring 1/2 of
previously acquired offsets until last delivery attempt i.e. 0-124 (125
records)
- For last delivery attempt, acquire only 1 offset. Then only the bad
record will be skipped.

Reviewers: Apoorv Mittal <[email protected]>, Andrew Schofield
 <[email protected]>, Abhinav Dixit <[email protected]>

---------

Co-authored-by: d00791190 <[email protected]>
…0942)

Changing log level from ERROR to WARN, as it's expected that this can
happen, and we should not incorrectly make users worry about it.

Reviewers: PoAn Yang <[email protected]>, Colt McNealy
 <[email protected]>, Lucas Brutschy <[email protected]>,
 Chia-Ping Tsai <[email protected]>
…(#20937)

jacksonDatabindYaml does not exist, it should be jacksonDataformatYaml.
I was a bit confused when I first saw the mention, I imagine others
might have the same.  Also moved the entry in the depencencies one row
down, so the order of the dependencies is more alphabetical.

Reviewers: Chia-Ping Tsai <[email protected]>
@jeqo jeqo force-pushed the upstream-sync-before-4.3 branch from 554e55d to 67b06d9 Compare February 3, 2026 08:36
@jeqo jeqo force-pushed the upstream-sync-before-4.3 branch from 67b06d9 to 140948b Compare February 3, 2026 08:37
Tests are failing in GitHub workflows, try to reduce parallelism
to check if it's related to resource exhaustion.
@giuseppelillo giuseppelillo force-pushed the upstream-sync-before-4.3 branch from b1ad7ea to 587a161 Compare February 5, 2026 13:11
@jeqo
Copy link
Copy Markdown
Contributor Author

jeqo commented Feb 9, 2026

Close in favor of #499

@jeqo jeqo closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.