Skip to content

KEYCLOAK-17503: Optimise offline session load on startup#7902

Closed
Flintholm wants to merge 1 commit intokeycloak:masterfrom
trifork:optimise_offline_session_loading
Closed

KEYCLOAK-17503: Optimise offline session load on startup#7902
Flintholm wants to merge 1 commit intokeycloak:masterfrom
trifork:optimise_offline_session_loading

Conversation

@Flintholm
Copy link
Copy Markdown

This PR contains optimisation startup time with a large number of offline sessions.

The reason that loading of offline sessions is quite slow, is an ineffetive "in memory" join between offline_user_session and offline_client_session. The current implementation generates a set of user_session_id's from a segment of loaded offline_user_sessions, and parses this set as a comma seperated string to the query for offline_client_sessions.

My proposed implementation instead sorts both offline_user_session and offline_client_sessions on user_client_id, which makes a more effective "join" possible.

Since the "lastCreatedOn" is no longer needed for segmentation, I have removed it (primarily in tests)

Also the "abc" used for lowest possible user_session_id have been replaced with "00000000-0000-0000-0000-000000000000".

@thomasdarimont
Copy link
Copy Markdown
Contributor

@Flintholm great work! I wonder if this can be combined with #7722

@Flintholm
Copy link
Copy Markdown
Author

@Flintholm great work! I wonder if this can be combined with #7722

If lazy loading remains optional, I think both PRs are relevant.
But if lazy loading in #7722 replaces the current eager loading, I actually dont think my PR has much value.

#7722 seems to be a much better solution to the problem. I just noticed it too late...
(Ideally with a limit to cache size to prevent run-away memory use)

Has there been a decision on whether lazy loading will replace eager - or become an option ?

@thomasdarimont
Copy link
Copy Markdown
Contributor

thomasdarimont commented Apr 2, 2021

@Flintholm thanks for the feedback.

Yes, I considered limiting cache sizes, but this is IMHO not so easy: You'd need to coordinate the offlineSession removal with the corresponding offlineClientSessions. This might be possible with the infinispan cache configuration in standalone-ha.xml via expiration and max-count/memory settings on offlineSession caches and some additional explicit cleanup when creating a new session, e.g. by evicting the oldest / least recently used offlineSession and offlineClientSessions.

Btw. If an offline session is not found in the cache (because of eviction, or cluster sync problems during rolling upgrades) but still in the database, then the session will be reloaded on demand from the database.

I think the plan is to keep the eager offline-session loading as default, but provide the lazy loading as an option.

Nevertheless, I'll give your PR a try and see how both can work together. :)

@hmlnarik hmlnarik self-assigned this Apr 6, 2021
@hmlnarik hmlnarik added area/storage Indicates an issue that touches storage (change in data layout or data manipulation) kind/enhancement Categorizes a PR related to an enhancement labels Apr 6, 2021
@hmlnarik hmlnarik requested review from hmlnarik and mposolda April 6, 2021 12:33
@nigtrifork
Copy link
Copy Markdown

@Flintholm thanks for the feedback.

Yes, I considered limiting cache sizes, but this is IMHO not so easy: You'd need to coordinate the offlineSession removal with the corresponding offlineClientSessions. This might be possible with the infinispan cache configuration in standalone-ha.xml via expiration and max-count/memory settings on offlineSession caches and some additional explicit cleanup when creating a new session, e.g. by evicting the oldest / least recently used offlineSession and offlineClientSessions.

Btw. If an offline session is not found in the cache (because of eviction, or cluster sync problems during rolling upgrades) but still in the database, then the session will be reloaded on demand from the database.

I think the plan is to keep the eager offline-session loading as default, but provide the lazy loading as an option.

Nevertheless, I'll give your PR a try and see how both can work together. :)

Hi @thomasdarimont

I think it would be possible to use an Infinispan Listener to coordinate offline client session removal. I have a proof of concept here:

package org.keycloak.models.sessions.infinispan;

import org.infinispan.Cache;
import org.infinispan.notifications.Listener;
import org.infinispan.notifications.cachelistener.annotation.CacheEntriesEvicted;
import org.infinispan.notifications.cachelistener.event.CacheEntriesEvictedEvent;
import org.jboss.logging.Logger;
import org.keycloak.models.sessions.infinispan.changes.SessionEntityWrapper;
import org.keycloak.models.sessions.infinispan.entities.AuthenticatedClientSessionEntity;
import org.keycloak.models.sessions.infinispan.entities.UserSessionEntity;

import java.util.UUID;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;

@Listener
public class OfflineUserSessionRemovalListener {

    private static final Logger log = Logger.getLogger(OfflineUserSessionRemovalListener.class);

    private final Cache<UUID, SessionEntityWrapper<AuthenticatedClientSessionEntity>> offlineClientSessionCache;

    public OfflineUserSessionRemovalListener(Cache<UUID, SessionEntityWrapper<AuthenticatedClientSessionEntity>> offlineClientSessionCache) {
        this.offlineClientSessionCache = offlineClientSessionCache;
    }

    @CacheEntriesEvicted
    public CompletionStage<Void> onEvent(CacheEntriesEvictedEvent<String, SessionEntityWrapper<UserSessionEntity>> event) {
        return CompletableFuture.runAsync(() ->
                event.getEntries().forEach((id, wrapper) -> {
                    log.debugf("Removing cached offline client sessions for offline user session with id = '%s'", id);
                    wrapper.getEntity().getAuthenticatedClientSessions().forEach((clientId, sessionId) -> {
                        log.debugf("Removing cached offline client session with id = '%s'", sessionId);
                        offlineClientSessionCache.remove(sessionId);
                    });
                })
        );
    }
}

The listener is added to the offline user session cache like this:

    private void registerOfflineSessionRemovalListener(KeycloakSession session) {
        log.debug("Registering Offline User Session Removal Listener");
        InfinispanConnectionProvider connections = session.getProvider(InfinispanConnectionProvider.class);
        Cache<String, SessionEntityWrapper<UserSessionEntity>> offlineSessionsCache = connections.getCache(InfinispanConnectionProvider.OFFLINE_USER_SESSION_CACHE_NAME);
        Cache<UUID, SessionEntityWrapper<AuthenticatedClientSessionEntity>> offlineClientSessionsCache = connections.getCache(InfinispanConnectionProvider.OFFLINE_CLIENT_SESSION_CACHE_NAME);
        offlineSessionsCache.addListener(new OfflineUserSessionRemovalListener(offlineClientSessionsCache));
    }

this method could be called in the org.keycloak.provider.ProviderEventListener#onEvent method defined in org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory#postInit to ensure the listener is only added once.

As you mention, eviction can be configured in patch-standalone-ha.cli (from #7722 ), like this:

/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions/memory=heap:add(size=10000)

I tested this in combination with #7722 in a single instance setup, with an h2 DB and a limit of 10 cached offline user sessions, and it seems promising.

Do your think this approach is viable?

@thomasdarimont
Copy link
Copy Markdown
Contributor

Thanks, this looks interesting - I'll take a look.

@hmlnarik
Copy link
Copy Markdown
Contributor

hmlnarik commented Apr 7, 2021

This may work in a single-cluster scenario. I am not sure this is viable in case of cross-DC deployments though, there are other listeners needed as well (search for @ClientListener, e.g. in RemoteCacheSessionListener). I have not checked the approach in-depth though

@hmlnarik
Copy link
Copy Markdown
Contributor

hmlnarik commented May 7, 2021

@Flintholm Thank you for this PR! This is going to be merged, but additional care was needed. Since the conflicts here were due to mostly my work, I've taken the liberty to rebase and update the tests of this PR in #8012. I am closing now this one in favor of #8012 and will be happy for your comments there.

@hmlnarik hmlnarik closed this May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage Indicates an issue that touches storage (change in data layout or data manipulation) kind/enhancement Categorizes a PR related to an enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants