Validation of Keycloak clustering deployment on Azure Container Apps #47099

chriskoutr · 2026-03-12T14:41:07Z

chriskoutr
Mar 12, 2026

Hello,

We would like to ask for feedback regarding a deployment architecture we are currently using for Keycloak clustering on Azure Container Apps, in order to achieve high availability. Since this platform is not commonly referenced in the official documentation, we would like confirmation from the community and maintainers that our approach is considered technically supported.

Environment
Keycloak version: 26.5.0
Hosting platform: Azure Container Apps
Number of instances: 2 to 5
Database: PostgreSQL

Deployment approach
Keycloak is deployed to Azure Container Apps using the following approach:
Keycloak is deployed as a single Azure Container App (azurerm_container_app) named "keycloak", running in Single revision mode with 2–5 replicas for HA.
Container Image: quay.io/keycloak/keycloak:26.5.0

Cluster configuration
To enable clustering, we configured the following values:
KC_CACHE=ispn,
KC_CACHE_STACK=jdbc-ping
(The defaults, for this version of keycloak)

All the following configuration is passed via environment variables:
KC_DB=postgres,
KC_DB_URL = ...,
KC_DB_USERNAME = ...,
KC_DB_PASSWORD = ...,
KC_HTTP_ENABLED=true,
KC_PROXY_HEADERS=xforwarded,
KC_HTTP_PORT=8080,
KC_HOSTNAME = ...,
KC_HOSTNAME_STRICT = ... {We update this after Keycloak FQDN had been created from ACA (check phase 1 -> phase 2 at README)},
JAVA_OPTS_APPEND with -Djgroups.* system properties (e.g. "-Djgroups.jdbc.connection_url={...} -Djgroups.jdbc.connection_username={...} -Djgroups.jdbc.connection_password={...} -Djgroups.jdbc.driver_name=postgresql -Djgroups.bind.address=GLOBAL -Djgroups.bind.port=7800 -Djava.net.preferIPv4Stack=true -Djgroups.use.mcast_addr=false"),
KC_BOOTSTRAP_ADMIN_USERNAME = ...,
KC_BOOTSTRAP_ADMIN_ PASSWORD = ...,
KC_HEALTH_ENABLED=true,
KC_METRICS_ENABLED=true

Other configured options:
Plain start (not start --optimized) is used, meaning Keycloak performs its auto-build/augmentation phase at first launch.

Resource Allocation:
CPU: 1.0 vCPU per replica
Memory: 2 GiB per replica

Ingress:
External HTTPS ingress is enabled via ACA's built-in Envoy load balancer, terminating TLS and forwarding plain HTTP to port 8080 internally.

Observed behavior
After deploying multiple instances, our observations indicate that the nodes appear to successfully form a cluster.

Specifically:

Logs show that multiple Keycloak nodes join the same cluster.
Each node appears as an entry in the JGROUPS_PING table in the PostgreSQL database, indicating that cluster discovery between nodes is functioning correctly.
During scaling operations (changing the number of replicas), sessions are not lost.

Verify HA Cluster
Some results for the HA testing we performed and verified can be found here:
https://github.com/hoolser/terraform-azurerm-keycloak-aca/blob/master/testing-ha-results.md

(Additional detailed tests we performed are documented here:
https://github.com/hoolser/terraform-azurerm-keycloak-aca/blob/master/testing-ha.md
)

Question
Based on the above setup and behavior:
Is this deployment architecture considered a valid and supported Keycloak clustering configuration, or are there potential limitations or risks we should be aware of when running Keycloak on Azure Container Apps in this way?

We have added our working implementation with terraform for Keycloak in ACA at the following github repository (with a detailed redme) to contribute to others they are facing similar issues/concerns:
https://github.com/hoolser/terraform-azurerm-keycloak-aca

We would appreciate any feedback or confirmation from the community or maintainers.

Thank you.

chriskoutr · 2026-03-23T11:34:52Z

chriskoutr
Mar 23, 2026
Author

Hello again,

Following up on this — since we haven’t received feedback yet, we wanted to simplify the question.

Given that:

We observe successful cluster formation (multiple members in cluster view and JGROUPS_PING)
Sessions are preserved during scaling and instance replacement
Authentication flows continue without interruption

Is there any known limitation or architectural reason why this setup (Keycloak on Azure Container Apps with embedded clustering) would not be considered reliable in production?

In other words, are there known edge cases (e.g. related to container lifecycle, networking, or JGroups behavior) that might not be visible in our current testing scenarios and validations?

Any guidance would be greatly appreciated.

1 reply

roisanchezriveira Mar 23, 2026

Hello @chriskoutr

I haven't been able to test your exact solution, something similar was working for us until a few weeks ago. But stopped being useable when we upgraded our Container Apps to Workload Profiles and started using Consumption_v2 plans.

I can't say from your code if your environment is using the old Consumption_only plan, or the newer Consumption_v2 one, as it seems that when not explicit, Terraform attempts to configure Consumption_only by default, but then some Azure regions force it to be Consumption_v2 after a recent change.

The part missing from our config could be adding this one bit -Djgroups.bind.address=GLOBAL and that, and seeing that you have delegated the subnet, makes me think this is viable with the v2 plans.

But it would be great if you could confirm which ACA Environment plan you have been using in this scenario.

Thanks,
Roi

chriskoutr · 2026-03-23T16:05:16Z

chriskoutr
Mar 23, 2026
Author

Hello @roisanchezriveira ,

Thank you for your response.

In our setup, we are using the Workload Profiles (Consumption_v2) model in Azure Container Apps.

Could you please clarify whether your setup was used in a production environment before you encountered the issue?

Additionally, it would be helpful to know whether you have tested the -Djgroups.bind.address=GLOBAL configuration in your setup, and if so, whether it had any impact.

Thank you in advance for your insights.

1 reply

roisanchezriveira Mar 26, 2026

@chriskoutr it was a production environment, we had to workaround it and were working on replacing ACA with something else.

But I tried this approach using the -Djgroups.bind.address=GLOBAL and it's initially working, not yet promoted the solution to Production, starting the implementation and full testing after this initial POC/try.

Strangely, the containers are registering in the database with a range of IPs in the 100.100.x.x range, which is not the 10.0.x.x snet range allocated to ACA. But that still allows replicas in the same ACA reaching each other.

I think this means that inter-regional communication (with 2 ACA environments in different Azure regions) for quicker and seamless DR failover is still out of the question, I have not tested that yet. But in any case is a great improvement for zone resiliency and horizontal scaling.

In summary, it looks promising, and seems to replicate much closer how it was working for us with Consumption_only plans before the change.

Thanks for sharing the config, it has been of great help and it's opening an easier path ahead for us.

starb3rryl0ver · 2026-03-27T10:32:24Z

starb3rryl0ver
Mar 27, 2026

Hi there,

We have the same problem of replicas communicating with each other for caches on ACA. After a lot of tries and failures, I found this post by accident. I have tried the settings above and can confirm that it works well in our setup. I scaled up the replicas count to 10 and back to 1, the replicas have no issues of internal communications. Logs and server info showed correct cluster info as well.

I looked further into the changes, and think the flag -Djgroups.use.mcast_addr=false is critical here. As Keycloak mentioned in the release note for 26.1.0, multicast is not well supported in cloud providers. Somehow, it looks like it is not always the case, that multicast is deactivated, because in our previous setup, sometimes the replicas had no issues of internal communication, sometimes it failed immediately.

I will continue watching the behavior. From my tests, the solution looks promising. Thanks for sharing the snippet, it helps a lot!

1 reply

chriskoutr Mar 27, 2026
Author

Thank you @starb3rryl0ver for your feedback!
Please let us know if you have any new findings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation of Keycloak clustering deployment on Azure Container Apps #47099

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Validation of Keycloak clustering deployment on Azure Container Apps #47099

Uh oh!

Uh oh!

chriskoutr Mar 12, 2026

Replies: 3 comments · 3 replies

Uh oh!

chriskoutr Mar 23, 2026 Author

Uh oh!

roisanchezriveira Mar 23, 2026

Uh oh!

chriskoutr Mar 23, 2026 Author

Uh oh!

roisanchezriveira Mar 26, 2026

Uh oh!

starb3rryl0ver Mar 27, 2026

Uh oh!

chriskoutr Mar 27, 2026 Author

chriskoutr
Mar 12, 2026

Replies: 3 comments 3 replies

chriskoutr
Mar 23, 2026
Author

chriskoutr
Mar 23, 2026
Author

starb3rryl0ver
Mar 27, 2026

chriskoutr Mar 27, 2026
Author