Skip to content

KEYCLOAK-16802 Improve performance of removed expired sessions in Jpa…#7764

Closed
mposolda wants to merge 2 commits intokeycloak:mainfrom
mposolda:KEYCLOAK-16802-offline-sessions-expiration
Closed

KEYCLOAK-16802 Improve performance of removed expired sessions in Jpa…#7764
mposolda wants to merge 2 commits intokeycloak:mainfrom
mposolda:KEYCLOAK-16802-offline-sessions-expiration

Conversation

@mposolda
Copy link
Copy Markdown
Contributor

@mposolda mposolda commented Feb 5, 2021

…UserSessionPersisterProvider

@mposolda mposolda self-assigned this Feb 5, 2021
@mposolda mposolda added area/storage Indicates an issue that touches storage (change in data layout or data manipulation) impact/medium kind/enhancement Categorizes a PR related to an enhancement labels Feb 5, 2021
@hmlnarik hmlnarik requested review from hmlnarik and martin-kanis and removed request for martin-kanis February 5, 2021 11:57
Copy link
Copy Markdown
Contributor

@hmlnarik hmlnarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mposolda. I didn't go through the code much, just noted this comment which might be related to this work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ID in case of map storage implementation needs to be a UUID. Can this be changed accordingly?
Disclaimer: Realm storage is right now being implemented, that's why the map storage tests did not catch this.
cc: @vramik

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmlnarik Thanks for the feedback! In this case, I rather changed this test to use realm names instead of realmId, so that test work for both JPA and mapStorage. Also I removed the method "id" from the RealmBuilder class, which I introduced in this PR (in the first version of this PR before you spot this).

@mposolda mposolda force-pushed the KEYCLOAK-16802-offline-sessions-expiration branch from d761c57 to 31c29a1 Compare February 8, 2021 08:34
Copy link
Copy Markdown
Contributor

@martin-kanis martin-kanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mposolda Thanks for the PR, overall it looks good to me. Just one small thing, see inline.

Also I'm wondering if you have some performance metrics? You used similar approach like in #7718 so are there similar performance gains?

@NamedQuery(name="deleteUserSessionsByUser", query="delete from PersistentUserSessionEntity sess where sess.userId = :userId"),
@NamedQuery(name="deleteExpiredUserSessions", query="delete from PersistentUserSessionEntity sess where sess.realmId = :realmId AND sess.offline = :offline AND sess.lastSessionRefresh < :lastSessionRefresh"),
@NamedQuery(name="deleteExpiredUserSessionsCrossRealm", query="delete from PersistentUserSessionEntity sess where"
+ " sess.realmId in (select realm.id from RealmEntity realm where realm.offlineSessionIdleTimeout = :offlineSessionIdle)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nitpicking" just for consistency

Suggested change
+ " sess.realmId in (select realm.id from RealmEntity realm where realm.offlineSessionIdleTimeout = :offlineSessionIdle)"
+ " sess.realmId IN (select realm.id from RealmEntity realm where realm.offlineSessionIdleTimeout = :offlineSessionIdle)"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

@mposolda mposolda force-pushed the KEYCLOAK-16802-offline-sessions-expiration branch from 4f2e41d to b5ba4d0 Compare February 9, 2021 10:48
@mposolda
Copy link
Copy Markdown
Contributor Author

mposolda commented Feb 9, 2021

@hmlnarik @martin-kanis @vramik I've added another commit with the alternative solution. I've used separate commit for it just to show the changes, but I can squash it afterwards. The previous approach did not perform well for the many offline sessions in the DB as the SQL with "NOT IN" looks like a bottleneck.

Details of performance testing: The use-case was expiration of big number of offline sessions with many realms deployments. Test was with:

  • PostgreSQL database
  • 500 realms with 10 different values of "Offline Session Idle Timeout" from 5 to 50 minutes
  • 500K of offline user sessions and offline client sessions in the DB.
  • The scheduled-task-interval was 5 minutes, so effectively each iteration should remove around 50K of offline sessions and after 10 iterations (around 50 minutes), there were no offline sessions or offline client sessions in the tables
  1. For the test before patch (current master), the times of background threads were: Transaction timeout (More than 300 seconds), 58424 ms, 37836 ms, 36355 ms, 34747 ms, 31714 ms, 29135 ms, 33032 ms, 22151 ms, 17496 ms, 12935 ms, 7259 ms, 2554 ms, 287 ms
    Note that the 1st iteration needed to cache the realms and hence very long time (This change is before the changes from the PR [KEYCLOAK-16600] - Improvements when loading many realms #7712, which is not merged yet. With that change, the time for 1st iteration will likely be smaller and no timeout will probably happen). The subsequent iterations were with all realms already cached, but still relatively slow due the fact that 1000 SQL queries needed for each iteration of background threads (2 SQL for each realm).

  2. The test after patch, the time for the background threads cleanup for similar conditions were: 1274 ms, 1379 ms, 1322 ms, 1321 ms, 1253 ms, 1095 ms, 999 ms, 880 ms, 677 ms, 609 ms, 468 ms, 45 ms.
    There is no caching of realms anymore and there are 20 SQL queries for each iteration due the 10 different offline idle timeouts (In reality, the case when all realms have same offlineIdleTimeout will be likely more realistic, so difference will be probably even bigger in comparison to the test 1. For the cases when each realm have different offlineSessionIdleTimeout, the times will be more close to the test 1 as the amount of SQL will be similar. However that is not very realistic in production to have many realms and each realm different times).

@martin-kanis
Copy link
Copy Markdown
Contributor

@mposolda Thank you for the performance statistics. It's good to see such an improvement.

Your second test is performed with "NOT IN" variant or with the second commit which you pushed here? I'm wondering how much more the second commit performs better.
I agree that "NOT IN" can perform poorly but at the same time I'm curious if the following part of the SQL query
u.realmId in (select realm.id from RealmEntity realm where realm.offlineSessionIdleTimeout = :offlineSessionIdle)"
which will be executed twice per iteration (once from deleteExpiredClientSessionsCrossRealm and once from deleteExpiredUserSessionsCrossRealm) won't make things worse.

@mposolda
Copy link
Copy Markdown
Contributor Author

@martin-kanis That is good point. The test (1) was done with the current Keycloak master, the test (2) was done with exactly what is available in this PR.

I've tried to measure performance also with the "NOT IN" variant, but performance was very bad for the bigger amount of records and for 50K offlineUserSessions in the DB (which is not very big amount), it failed entirely with the transaction timeout. For example, there is also this warning in the PostgreSQL docs for usage of "NOT IN" and the performance limitations: https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN .

I've tried also the variant with "NOT EXISTS", but this doesn't work (at least) with MySQL when used in JPQL together with DELETE command.

So finally ended with the version in the current PR, which seems to work with all databases and have better performance than current master. Similar approach is used also for removeExpired(realm) where you can see same approach for deleteExpiredClientSessions and deleteExpiredUserSessions instead of an attempt to delete detached client sessions.

The performance with this PR is significantly better than current master, at least in my test, which uses 10 different timeouts for various realms. It can be possibly worse than current master in case that you would have N realms with N completely different offlineIdleTimeouts in each realm, but as I mentioned above, IMO in reality, it is rather much more probable that count of different timeouts will be likely smaller (usually 1) in production environments like for example MAS SSO.

@mposolda
Copy link
Copy Markdown
Contributor Author

@hmlnarik Thanks for pointing the comment from the other PR, I somehow missed it before. I've created https://issues.redhat.com/browse/KEYCLOAK-17111 for this. I see that before this PR, the InfinispanUserSessionProvider.removeExpired(realm) also ignored client-overriden timeouts, so https://issues.redhat.com/browse/KEYCLOAK-17111 is not a regression of https://issues.redhat.com/browse/KEYCLOAK-16755 . However it worth fixing, so I triaged and planned to be fixed.

@eldarj
Copy link
Copy Markdown

eldarj commented Aug 15, 2022

What's with this? Was this done?

@mposolda
Copy link
Copy Markdown
Contributor Author

@eldarj Sorry, this is "on hold" for now due the team focus on the new store and this PR is related to the old store. I am not sure if we restore the work on this PR due the other priorities (and due testing this change is not so trivial as it requires further testing and performance verifications etc...)

@pedroigor
Copy link
Copy Markdown
Contributor

@mposolda I'm closing it for now but we can re-open anytime if the plans change.

@pedroigor pedroigor closed this Aug 29, 2022
@mposolda
Copy link
Copy Markdown
Contributor Author

@pedroigor Sure, Thanks for closing this one.

@yelhouti
Copy link
Copy Markdown
Contributor

@pedroigor @mposolda I was doing some profiling and found kecloak spending a lot of time on this.
This comment suggest is not yet fixed:

// TODO: Avoid iteration over all realms here (Details in the KEYCLOAK-16802)
session.realms().getRealmsStream().forEach(this::removeExpired);

Could this please be reopened ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage Indicates an issue that touches storage (change in data layout or data manipulation) kind/enhancement Categorizes a PR related to an enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants