Updated Traefik passthrough README with health check and configuration details#748
Updated Traefik passthrough README with health check and configuration details#748ruchikajha95 wants to merge 4 commits intokeycloak:mainfrom
Conversation
… details Signed-off-by: Ruchika <[email protected]>
|
We forgot about CI. Please add a job to https://github.com/keycloak/keycloak-quickstarts/blob/1b60f49d7d809599c3cf0d4ca9916dc71c4be715/.github/workflows/workflow-ci.yml |
Signed-off-by: Ruchika <[email protected]>
ahus1
left a comment
There was a problem hiding this comment.
Thank you for the PR. I suppose the traefik file is missing the services and routers entries. Can you please check?
Signed-off-by: Ruchika <[email protected]>
ahus1
left a comment
There was a problem hiding this comment.
I see you previously experimented with a TCP check. What was the result with that?
BTW, one can set KC_HTTP_MANAGEMENT_SCHEME: http so the management endpoint doesn't run HTTPS which might open the door for the TCP based check. Still I wasn't able to get it to work, and traefik didn't provide me with the response it received from Keycloak, so it was hard to debug.
When in debug mode, I figured out that "\n" was taken as a literal string, and not as a line break. I then use multi-line YAML, which looked better, but then didn't succeed.
If we go down the route of not having a readiness probe, this should be reflected in the README, possibly with a link to the open issue.
Without a probe, we would also need to disable server-async-bootstrap (this is probably why the tests are failing).
Without a probe, there is also not much benefit for a graceful shutdown as it is scripted here. We would actually need to have a different playbook for a graceful shutdown where one would first need to update the keycloak.yaml to remove one of the servers before shutting it down.
|
@ahus1 I didn't experiment with HTTP, but the probe in Traefik compares with all the server response (looking at the source code). +1 to disable async-bootstrap and +1 to set shutdown-delay to zero. |
Oh, that gives me an idea! See the next push with a suggested fix that works for me locally. |
Signed-off-by: Alexander Schwartz <[email protected]>
|
@pruivo / @ruchikajha95 - I've pushed the changes, and the CI is happy :D Now I wonder if you think this check is good and you want to continue with it and update the docs, or if you think this check is too fragile. I'd argue in favor of keeping it, as having a readiness probe gives you lots of benefits like split-brain detection, a node that is too overloaded, can't talk to the database, better shutdown behavior, etc. |
|
@ahus1 It looks fragile (and ugly), mainly because of the charset that may change by the user. But I'm ok to have it in the quickstarts. |
Thank you @ahus1 , @pruivo .I was experimenting with something similar yesterday but couldn't get it to work on my end !glad it worked now! I'm happy to keep the changes in the quickstart and proceed with the corresponding documentation updates in the Keycloak repo.! |
Thanks. Yes, please proceed. |
ahus1
left a comment
There was a problem hiding this comment.
Thank you for the PR. After my latest push with the health check, the README is unfortunately out of date. Can you please update it?
Please review if you think any other items would require a longer explanation of "why" and possible alternatives.
| Since Traefik operates at the TCP layer for TLS passthrough, it performs TCP-level health checks on each backend server. | ||
|
|
||
| ```yaml | ||
| healthCheck: | ||
| interval: "5s" | ||
| timeout: "3s" |
There was a problem hiding this comment.
After my change to keycloak.yaml this is not outdated. Can you please update the README that it matches the configuration?
It would be good to have a paragraph about the why the health check is chosen, what possible downsides are (fragile, no TLS), and what the alternatives are (simple TCP, different shutdown procedure, no readiness check, no split brain detection).
|
@ruchikajha95, the README in |
|
|
||
| ## Architecture | ||
|
|
||
|  |
There was a problem hiding this comment.
The image says TCP passthrough. Did you mean TLS passthrough?
| postgres: | ||
| condition: service_healthy | ||
| networks: | ||
| - frontend |
There was a problem hiding this comment.
This is wrong; Keycloak must use only the backend network.
| **Traefik health check :** | ||
|
|
||
| **Traefik health check:** |
| api: | ||
| insecure: true | ||
| ``` | ||
| - `insecure: true` exposes the Traefik dashboard without any authentication. This is intended for local development and debugging only and must never be used in production. |
There was a problem hiding this comment.
It says the same thing twice:
exposes the Traefik dashboard without any authenticationandEnables the Traefik dashboard on port 8080 without authenticationThis is intended for local development and debuggingandfor local testing only.
| serversTransports: | ||
| kc-passthrough: | ||
| proxyProtocol: | ||
| version: 2 |
There was a problem hiding this comment.
Indentation is off
| serversTransports: | |
| kc-passthrough: | |
| proxyProtocol: | |
| version: 2 | |
| serversTransports: | |
| kc-passthrough: | |
| proxyProtocol: | |
| version: 2 |
| timeout: "3s" | ||
| ``` | ||
|
|
||
| - Traefik checks each backend every 5s, marking a server as unhealthy if it does not respond within 3s. Unhealthy backends are automatically removed from the load balancer rotation and re-added once they recover |
There was a problem hiding this comment.
To avoid duplicating the configuration, I would remove this section and move this line after line 167
Don't forget to mention CLI option KC_HTTP_MANAGEMENT_SCHEME.
| - address: "keycloak1:8443" | ||
| - address: "keycloak2:8443" | ||
| ``` | ||
| - Defines the two Keycloak backend servers. Traefik distributes incoming TCP connections across both instances using round-robin load balancing. |
| ``` | ||
| - Defines the two Keycloak backend servers. Traefik distributes incoming TCP connections across both instances using round-robin load balancing. | ||
|
|
||
| - `docker run --rm --network passthrough_backend alpine nc -zv keycloak1 9000` verify the management port is reachable within the internal Docker network (as seen in the nc test) |
| @@ -0,0 +1,14 @@ | |||
|
|
|||
There was a problem hiding this comment.
Compress the image or convert to SVG. 5MB is way too big 😃
Updated Traefik passthrough README with health check and configuration details
Signed-off-by: Ruchika [email protected]