Skip to content

KEYCLOAK-11019 Initial support for lazy offline user-session loading#7722

Closed
thomasdarimont wants to merge 6 commits intokeycloak:masterfrom
thomasdarimont:issue/KEYCLOAK-11019-lazy-offline-session-loading
Closed

KEYCLOAK-11019 Initial support for lazy offline user-session loading#7722
thomasdarimont wants to merge 6 commits intokeycloak:masterfrom
thomasdarimont:issue/KEYCLOAK-11019-lazy-offline-session-loading

Conversation

@thomasdarimont
Copy link
Copy Markdown
Contributor

@thomasdarimont thomasdarimont commented Jan 15, 2021

This PR introduces initial support for lazy-loading of offline user-sessions.
See the discussion on the keycloak-dev group

As a start, we allow to disable the pre-loading of offline sessions from the database. At a later stage we can also disable the pre-loading from a remote-cache storage.

If an offline user-session is missing from the infinispan sessions or clientSession cache, we try to find the offline user-session in the database. If the session is found, we dynamically import it into the appropriate infinispan cache.

Additionally we now allow to disable loading of offline user-sessions during startup via the new preloadOfflineSessionsFromDatabase property of the infinispan userSessions provider.

patch-standalone-ha.cli:

embed-server --server-config=standalone-ha.xml --std-out=echo

/subsystem=logging/console-handler=CONSOLE:write-attribute(name=level,value=DEBUG)

if (outcome == failed) of /subsystem=logging/logger=org.keycloak.models.sessions.infinispan/:read-resource
  /subsystem=logging/logger=org.keycloak.models.sessions.infinispan:add(level=DEBUG)
end-if

echo SETUP: Customize Keycloak UserSessions SPI configuration
# Add dedicated userSession config element to allow configuring elements.
if (outcome == failed) of /subsystem=keycloak-server/spi=userSessions/:read-resource
  echo SETUP: Add missing userSessions SPI
  /subsystem=keycloak-server/spi=userSessions:add()
  echo
end-if

echo SETUP: Configure built-in "infinispan"  UserSessions loader
if (outcome == failed) of /subsystem=keycloak-server/spi=userSessions/provider=infinispan/:read-resource
  echo SETUP: Add missing "infinispan" provider
  /subsystem=keycloak-server/spi=userSessions/provider=infinispan:add(enabled=true)
  /subsystem=keycloak-server/spi=userSessions/provider=infinispan:write-attribute(name=properties.preloadOfflineSessionsFromDatabase,value=${env.KEYCLOAK_INFINISPAN_SESSIONS_PRELOAD_DATABASE:false})
  /subsystem=keycloak-server/spi=userSessions:write-attribute(name=default-provider,value=infinispan)
  echo
end-if

build the keycloak distribution from source:

mvn -Pdistribution -pl distribution/server-dist -am -Dmaven.test.skip clean install

once this is finished, you can find your Keycloak distribution in distribution/server-dist/target as .zip or .tar.gz.

Apply patch-standalone-ha.cli to standalone-ha.xml

echo "yes" | bin/jboss-cli.sh --file patch-standalone-ha.cli

Start Keycloak:

bin/standalone.sh -c standalone-ha.xml --debug -Dwildfly.statistics-enabled=true

You should now see a DEBUG log message like:

...
13:09:30,850 DEBUG [org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory] (ServerService Thread Pool -- 65) Skipping pre-loading of userSessions from persistent storage
...

If you try to obtain a new access token for a given (offline) refresh-token you should see the following in the log:

...
13:09:37,538 DEBUG [org.keycloak.models.sessions.infinispan.InfinispanUserSessionProvider] (default task-1) Offline user-session not found in infinispan, attempting UserSessionPersisterProvider lookup for sessionId=9c5aa5fa-1611-4c70-a4ec-10bb0edf5690
13:09:37,555 DEBUG [org.keycloak.models.sessions.infinispan.InfinispanUserSessionProvider] (default task-1) Offline user-session found in UserSessionPersisterProvider, attempting to reimport user-session for sessionId=9c5aa5fa-1611-4c70-a4ec-10bb0edf5690
13:09:37,573 DEBUG [org.keycloak.models.sessions.infinispan.InfinispanUserSessionProvider] (default task-1) Offline user-session imported, trying another lookup for sessionId=9c5aa5fa-1611-4c70-a4ec-10bb0edf5690
13:09:37,574 DEBUG [org.keycloak.models.sessions.infinispan.InfinispanUserSessionProvider] (default task-1) Offline user-session found after import for sessionId=9c5aa5fa-1611-4c70-a4ec-10bb0edf5690
...

@hmlnarik hmlnarik added area/server area/storage Indicates an issue that touches storage (change in data layout or data manipulation) area/core kind/enhancement Categorizes a PR related to an enhancement labels Jan 18, 2021
Copy link
Copy Markdown
Contributor

@pedroigor pedroigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

It should not break existing behavior while making possible to lazy load offline sessions. The impact on sessions list should be expected as we discussed in the mailing list.

Let's see what @mposolda thinks about it.

@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch from f3e36a0 to 1256be1 Compare January 18, 2021 18:56
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

Rebased on current master

Copy link
Copy Markdown
Contributor

@martin-kanis martin-kanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasdarimont Thank you for the PR.
Is this feature somehow tested? If not, maybe it's worth to add some tests.

Next the JpaUserSessionPersisterProvider.java should use Java Streams as we recently changed collections to streams where ever that make sense. See the suggestions inline.

Finally I would add javadocs to newly introduced methods in UserSessionPersisterProvider.java interface.

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

thomasdarimont commented Jan 19, 2021

@martin-kanis Thanks or your suggestions. I'd prefer to leave the code as is for now, to keep the style in this class consistent.
A "streamification" should IMHO be applied afterwards in a holistic refactoring.

Btw. I think the JPA query result stream in org.keycloak.models.jpa.session.JpaUserSessionPersisterProvider#loadUserSessions should also be wrapped in org.keycloak.utils.StreamsUtil#closing.

I'm fine with adding the javadoc comments to the new API.

Regarding tests, how can I test offline-sessions with a custom InfinispanUserSessionProviderFactory configuration? I could not find any examples that test offline-token support appart from OfflineTokenTest. Would it be enough to run the OfflineTokenTest with a custom InfinispanUserSessionProviderFactory configuration? Would that be enough?

@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch from 1256be1 to 8c8ee34 Compare January 19, 2021 21:44
@martin-kanis
Copy link
Copy Markdown
Contributor

martin-kanis commented Jan 21, 2021

@thomasdarimont Actually the streamification epic was finished by this commit which also covers the JpaUserSessionPersisterProvider.java class. For this reason, we need all new code where it makes sense, to be using Java streams.

@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch 2 times, most recently from fe3571a to 50467ac Compare January 25, 2021 18:04
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

@martin-kanis Thanks for the hint. I just rebased the PR to the current master and revised the Streams API usage as you suggested.

Do you have an idea how I could address the testing issues I mentioned above?

@pedroigor
Copy link
Copy Markdown
Contributor

@thomasdarimont Actually the streamification epic was finished by this commit which also covers the JpaUserSessionPersisterProvider.java class. For this reason, we need all new code where it makes sense, to be using Java streams.

I also think "streamification" should not be a design principle, but used when (after testing and profiling) and where it makes sense.

Copy link
Copy Markdown
Contributor

@stianst stianst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what was agreed on in the thread as it was a lot of back and forth, mixing short-term and long-term. This is kinda why we have the design proposals in the first place so we can have a written down agreed upon plan, as mailing lists are okay for discussions, but terrible way of capturing the output of a discussion.

So as lazy loading from the DB is disabled by default, I assume that's because it's breaking something, as otherwise it would just make no sense to have this as an option and just always lazy load offline sessions from the DB. What specifically is being broken?

Copy link
Copy Markdown
Contributor

@stianst stianst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to request at least some changes here:

  • Describe what the side-effects are. It seems like to me that this may break anything that views/lists offline sessions in cases where the sessions hasn't been lazy loaded yet, including the ability for users and admins to view/manage offline sessions. If that's the case I don't really like the idea of merging this PR
  • If it's an option it needs documenting. If there are going to be difference in behaviour it has to be documented. As it stands this is a "hidden" config option that only a select few will know about, and anyone that stumbles across it won't know what benefits/drawbacks the option provides

@pedroigor
Copy link
Copy Markdown
Contributor

pedroigor commented Jan 26, 2021

I would say that loading data and blocking at boot time is by itself a broken design. And for this particular case, it has a huge impact on use cases using offline sessions.

We have quite a few workarounds already to overcome this problem (e.g.: sessionsPerSegment, DB indexes, not sure if documented) that works until a certain point. Oracle, for instance, has a serious limitation on how much you can set sessionsPerSegment.

IMO, the proposed changes provide a good balance for those facing the problem we are fixing while sacrificing some functionality that is not used or important. It should also help to pave the road for further improvements and in a proper fix that not necessarily impacts existing functionality.

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

thomasdarimont commented Jan 27, 2021

Hello @stianst, @pedroigor, @martin-kanis,

Thanks a lot for the great discussion and useful feedback!

"Describe what the side-effects are."

  1. With the changes from this PR and preloadOfflineSessionsFromDatabase=true (current default)
    a) Keycloak will pre-load offline sessions at startup as usual - which might block the server start for a large number of offline-sessions for multiple minutes
    b) The number of offline-sessions per client in the admin-console is based on the offline-user session count in memory
    c) (new) If an offline user session is not found in memory, a database lookup is performed - if successful the offline-session is imported into infinispan. This useful in Kubernetes rolling upgrade, which's prone to "loose" some offline-sessions due to incomplete infinispan cache synchronizations.
    d) Everything else (session listing in admin-console etc.) works as before
    e) Admin-Console / Manage / Session -> will show the number of Active Sessions from memory and number of Offline-sessions from memory

  2. With the changes from this PR and preloadOfflineSessionsFromDatabase=false
    a) Keycloak will NOT pre-load offline sessions at startup -> user-session / client-session caches will be empty at (server/cluster) start
    b) The number of offline-sessions per client in the admin-console is based on the offline-user session count in the database
    c) If an offline user session is not found in memory, a database lookup is performed - if successful the offline-session is imported into infinispan. This is useful with a rolling upgrade in Kubernetes, which's prone to "loose" some offline-sessions due to incomplete infinispan cache synchronizations.
    d) The client session listing in the admin-console might not show all offline-sessions or even be empty if no offline session were loaded yet. As there might be hundreds of millions of offline sessions, it is IMHO not useful at all to be able to list all via the admin-console. The warning "Warning, this is a potentially expensive operation depending on the number of offline tokens." gives a hint at this as well.
    e) Admin-Console / Manage / Session -> will show the number of Active Sessions from memory and the number of Offline-sessions from the database

"If it's an option, it needs documenting"

I'd document this alongside the mentioned sessionsPerSegment. As I also couldn't find a section about this in the docs,
I'd recommend adding a section "Performance Tuning" to Server Installation and Configuration Guide.

@stianst This PR is an IMHO workaround that solves a current flaw in Keycloak's offline-session handling that affects a large number of Keycloak deployments. This helps to solve scalability problems for IMHO a large number of users today.

Some useful enhancements

  • Offline user-sessions caches could be configured with expiration to remove unused cache items after some time to save memory
  • A user search could improve the listing of offline-sessions for a client in the admin-console, by allowing explicit user lookups.

This is, of course, not a replacement for the mid-long term storage architecture redesign :)

@stianst
Copy link
Copy Markdown
Contributor

stianst commented Jan 27, 2021

@thomasdarimont Thanks for the update, that's very useful information. Main thing I'd want is that there is no limitations that prevents people from being able to use it. For the account console will it load offline sessions from the DB for the given user so the user can see offline sessions even though they haven't been pre-loaded yet? For the admin console if I look at sessions for a given user does it show them then?

Reminding myself how the sessions management looks like in the admin console there's options to logout sessions for a given user, or all sessions, not for an individual application. So as long as we can logout for a given user I think we're covered here as a "working workaround" ;)

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

thomasdarimont commented Jan 27, 2021

@stianst thanks for the quick reply.

For the account console will it load offline sessions from the DB for the given user so the user can see offline sessions even though they haven't been pre-loaded yet?

Good catch. If the offline user-session pre-loading is disabled and the offline user-session is not loaded yet, then the "revoke grant" button is missing in the applications page of the account-console. The current implementation of the ApplicationsBean seems to load the offline-sessions only from the cache.
I think this should be changed to try a database lookup as well, in case the offline session was not found in the cache.
We could either import the offline session or just extract the metadata from the database, which might be the proper way if we say that only a token refresh request should trigger offline user-session loading.

For the admin console if I look at sessions for a given user does it show them then?

If the offline user-session is not loaded yet, then user/consents will not list the offline_access grant.
Solution could be the same as above, we could load only the from the database without importing the offline-session.

I give this a quick try now.

@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch from 3b1b0e8 to 4bdfbe8 Compare January 27, 2021 15:21
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

@stianst @pedroigor I just updated the PR with the changes to fix the issues mentioned by stian

  • The application page in the (old) account-console now triggers offline user-session loading and shows the "revoke access" button.
  • The consents listing of the user page in the admin-console now triggers offline user-session loading and shows the offline-access consent

I think the only caveat rest is that admin-console -> clients -> client -> Offline Access will only show offline-sessions that are loaded in memory. The (total) number of offline sessions will be the correct count of offline-sessions in the database.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to have pagination support here, but I wasn't sure about default limits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO pagination support may not be so urgent for the users? I suppose that it will be very rare that single user has for example 1000 offline sessions? Pagination makes sense for the clients IMO and it is already available in the admin console (however not yet supported in the PR as mentioned in the comment).

@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch from 02b6e62 to c508329 Compare March 3, 2021 15:35
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

Just rebased on current master

Copy link
Copy Markdown
Contributor

@hmlnarik hmlnarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thomasdarimont Thank you for patience - it took a while for me to get to this PR as I'm still in a catch-up mode after an unexpected long term leave.
Overall it looks good to me, only the indices could be improved as you have also mentioned. I've done some analysis of the queries here. Based on the analysis, I am suggesting change the indices, please see inline.

Comment on lines 47 to 71
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that IDX_OFFLINE_USS_BY_USER and IDX_OFFLINE_USS_BY_USERSESS would be more helpful if structured differently:

  • IDX_OFFLINE_USS_BY_USERSESS as index on (REALM_ID, OFFLINE_FLAG, USER_SESSION_ID) (different field order)
  • IDX_OFFLINE_USS_BY_USER as index on (USER_ID, REALM_ID, OFFLINE)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll adjust the indices as you suggested.

Comment on lines 43 to 58
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reversing the two fields, ie. index on (CLIENT_ID, OFFLINE_FLAG) would be more helpful. OFFLINE_FLAG selectivity is rather bad, so it roughly corresponds to seq scan, while client_id is rather very specific. Sole CLIENT_ID is used in deleteClientSessionsByClient query.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices could be improved, see below. The analysis of how to improve the indices is here

@stianst stianst modified the milestones: 13.0.0, 14.0.0 Mar 24, 2021
@hmlnarik
Copy link
Copy Markdown
Contributor

Hi @thomasdarimont, would you have time to move this PR forward?

@hmlnarik
Copy link
Copy Markdown
Contributor

hmlnarik commented Apr 1, 2021

Hi @thomasdarimont This PR has been inactive for almost two weeks. Will you be able to move this PR forward?

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

@hmlnarik yes, I have a few days vacation ahead and I'm going to use some of my time to work on this.

As I've been also involved in the user-profile / validation API discussion, I had to adjust some priorities ;-)

…ase query if in-memory lookup fails for offline sessions

Allow to configure pre-loading of offline user-sessions

We now allow to configure whether offline user-sessions should be
preloaded on startup from the database.

We now trigger offline user-sessions loading when
UserSessionProvider#getOfflineUserSessionsStream(realm,user) is called.

This ensures that the user can revoke the offline_access consent
via the account console. Additionally the (offline) user consents
are now displayed correctly for a user in the admin-console.

Offline user-sessions are now correctly display if offline user-sessions
are loaded lazily.

Add note about unsupported lazy offline session loading by broker info:
Currently we don't support lazy offline user-session lookup by
brokerSessionId and brokerUserId.

See: keycloak#7722 (review)

Keep behavior of 'logout all sessions' and don't delete dangling offline sessions.
As discussed in keycloak#7722 (comment)

Throw on unsupported dynamic offline user-session lookups from database
Removed redundant condition checking.

We use the `coalesce` function to make the queries work for a
specific realm / client or all, e.g.:
```
clientSess.clientId = coalesce(CAST(:clientId as text), clientSess.clientId)
```
Note that the explicit cast is required to treat null values as text to avoid
confusing databases.
@thomasdarimont thomasdarimont force-pushed the issue/KEYCLOAK-11019-lazy-offline-session-loading branch from c508329 to ea19d79 Compare April 2, 2021 22:26
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

@hmlnarik I updated the indices and rebased the branch again.

Btw. did you see #7902? I think that and this PR would work well together.

There is also the idea of expiring offline session cache entries to keep memory consumption low mentioned here: #7902 (comment)

@hmlnarik
Copy link
Copy Markdown
Contributor

hmlnarik commented Apr 7, 2021

@thomasdarimont Apologies I did not respond. Thank you for the updates, the PR looks much better. Could you please have a look at the failures of GitHub actions? They are related to this PR.

Yup, #7902 looks interesting though I've not yet have a chance to go through it in detail.

@Flintholm
Copy link
Copy Markdown

@thomasdarimont We (me and @nigtrifork) have been experimenting with lazy loading of the offlineClientSessions as an extension of this PR. The idea was to avoid the synchonization issues mentionen in the discoussion in #7902 if the offline session caches are configured to expire.
Do you think this could be viable solution for basically keeping the offline sessions in db, and only use the caches to most recent sessions ?

martin-kanis pushed a commit to martin-kanis/keycloak that referenced this pull request May 11, 2021
…ase query if in-memory lookup fails for offline sessions

Allow to configure pre-loading of offline user-sessions

We now allow to configure whether offline user-sessions should be
preloaded on startup from the database.

We now trigger offline user-sessions loading when
UserSessionProvider#getOfflineUserSessionsStream(realm,user) is called.

This ensures that the user can revoke the offline_access consent
via the account console. Additionally the (offline) user consents
are now displayed correctly for a user in the admin-console.

Offline user-sessions are now correctly display if offline user-sessions
are loaded lazily.

Add note about unsupported lazy offline session loading by broker info:
Currently we don't support lazy offline user-session lookup by
brokerSessionId and brokerUserId.

See: keycloak#7722 (review)

Keep behavior of 'logout all sessions' and don't delete dangling offline sessions.
As discussed in keycloak#7722 (comment)

Throw on unsupported dynamic offline user-session lookups from database
@martin-kanis
Copy link
Copy Markdown
Contributor

@thomasdarimont I opened a new PR #8033 which includes this PR plus rebase to current master plus adds some model tests. Let us please know whether you wish pull those changes here or we can close this PR and proceed with #8033. Thank you.

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

@martin-kanis thanks for the update! I have no problem closing this PR and continue in the combined efforts in the new one.

@hmlnarik
Copy link
Copy Markdown
Contributor

Thank you @thomasdarimont and @martin-kanis. Closing this one in favor of #8033.

@hmlnarik hmlnarik closed this May 13, 2021
martin-kanis pushed a commit to martin-kanis/keycloak that referenced this pull request May 24, 2021
…ase query if in-memory lookup fails for offline sessions

Allow to configure pre-loading of offline user-sessions

We now allow to configure whether offline user-sessions should be
preloaded on startup from the database.

We now trigger offline user-sessions loading when
UserSessionProvider#getOfflineUserSessionsStream(realm,user) is called.

This ensures that the user can revoke the offline_access consent
via the account console. Additionally the (offline) user consents
are now displayed correctly for a user in the admin-console.

Offline user-sessions are now correctly display if offline user-sessions
are loaded lazily.

Add note about unsupported lazy offline session loading by broker info:
Currently we don't support lazy offline user-session lookup by
brokerSessionId and brokerUserId.

See: keycloak#7722 (review)

Keep behavior of 'logout all sessions' and don't delete dangling offline sessions.
As discussed in keycloak#7722 (comment)

Throw on unsupported dynamic offline user-session lookups from database
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core area/storage Indicates an issue that touches storage (change in data layout or data manipulation) kind/enhancement Categorizes a PR related to an enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants