Skip to content

Enlist JPA transaction in JpaMapStorageProvider.getStorage#11328

Merged
hmlnarik merged 1 commit intokeycloak:mainfrom
sguilhen:11230-rework-jpa-tx
Apr 23, 2022
Merged

Enlist JPA transaction in JpaMapStorageProvider.getStorage#11328
hmlnarik merged 1 commit intokeycloak:mainfrom
sguilhen:11230-rework-jpa-tx

Conversation

@sguilhen
Copy link
Copy Markdown
Contributor

Closes #11230

@sguilhen sguilhen added area/storage Indicates an issue that touches storage (change in data layout or data manipulation) team/storage-sig labels Apr 14, 2022
@sguilhen sguilhen marked this pull request as draft April 14, 2022 19:32
Copy link
Copy Markdown
Contributor

@hmlnarik hmlnarik Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the full inline class form, MapStorage is intentionally not marked as @FunctionalInterface as there might be methods added in the future

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just followed what we have in LDAP:

return session -> {
MapKeycloakTransaction<V, M> sessionTx = session.getAttribute(sessionTxPrefix + modelType.hashCode(), MapKeycloakTransaction.class);
if (sessionTx == null) {
sessionTx = factory.createTransaction(session, modelType);
session.setAttribute(sessionTxPrefix + modelType.hashCode(), sessionTx);
}
return sessionTx;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this one, I missed it there. Could you please file an issue for the LDAP and fix only the JPA code in this PR?
cc: @ahus1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #11373

@sguilhen
Copy link
Copy Markdown
Contributor Author

sguilhen commented Apr 14, 2022

Summary of changes

  • JPA transaction (EntityTransaction) is now wrapped as KeycloakTransactionand enlisted the first time getStorage is called on a JpaMapStorageProvider instance; area-specific transactions are now "logical" transactions that implement begin, commit and rollback as no-ops, relying on the underlying JPA transaction instead.
  • AbstractMapStorageFactory.getStorage now uses the session.getComponentProvider method to obtain a reference to the MapStorageProvider, benefiting from the caching performed by the session and ensuring the same provider instance will be returned for a particular componentId for the duration of the session. This is important to ensure that all area-specific providers obtain a transaction object that uses the same entity manager that was used to create the JPA transaction and thus participate in the right transaction.
  • JpaMapStorageProvider now gets a unique key upon creation and uses the key to put the enlisted JPA transaction in the session. This is needed to enable support for multiple profiles of the same storage factory later on (e.g. two different DB configs where part of the areas such as realm, clients, roles, etc go to one DB and users, groups, etc to another DB). In this multi-profile setting, we want to ensure that all areas are participating in the JPA transaction that targets the right DB for them, and for that we need one storage provider instance per profile. Assigning different keys to each instance is what keeps providers from re-using the wrong JPA transaction.

@sguilhen
Copy link
Copy Markdown
Contributor Author

Testsuite results

The base testsuite was run with all currently available Map JPA providers configured. Only one test is currently failing in this setting: OIDCClientRegistrationTest#testClientWithScope. The test itself passes, but when AuthServerTestEnricher tries to remove the test realm at the end, it fails with

org.hibernate.TransientPropertyValueException: object references an unsaved transient instance - save the transient instance before flushing : org.keycloak.models.map.storage.jpa.client.entity.JpaClientAttributeEntity.root -> org.keycloak.models.map.storage.jpa.client.entity.JpaClientEntity
	at org.hibernate.engine.spi.CascadingActions$8.noCascade(CascadingActions.java:379)
	at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:165)
	at org.hibernate.event.internal.AbstractFlushingEventListener.cascadeOnFlush(AbstractFlushingEventListener.java:159)
	at org.hibernate.event.internal.AbstractFlushingEventListener.prepareEntityFlushes(AbstractFlushingEventListener.java:150)
	at org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:83)
	at org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:46)
	at org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1433)
	at org.hibernate.internal.SessionImpl.scroll(SessionImpl.java:1654)
	at org.hibernate.query.internal.AbstractProducedQuery.doScroll(AbstractProducedQuery.java:1537)
	at org.hibernate.query.internal.AbstractProducedQuery.scroll(AbstractProducedQuery.java:1523)
	at org.hibernate.query.internal.AbstractProducedQuery.stream(AbstractProducedQuery.java:1547)
	at org.hibernate.query.criteria.internal.compile.CriteriaQueryTypeQueryAdapter.stream(CriteriaQueryTypeQueryAdapter.java:89)
	at org.hibernate.query.Query.getResultStream(Query.java:1107)
	at org.keycloak.models.map.storage.jpa.JpaMapKeycloakTransaction.read(JpaMapKeycloakTransaction.java:115)
	at org.keycloak.models.map.role.MapRoleProvider.getClientRolesStream(MapRoleProvider.java:155)
	at org.keycloak.models.map.role.MapRoleProvider.removeRoles(MapRoleProvider.java:192)
	at org.keycloak.storage.RoleStorageManager.removeRoles(RoleStorageManager.java:191)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.removeRoles(RealmCacheSession.java:814)
	at org.keycloak.models.map.client.MapClientProvider.removeClient(MapClientProvider.java:200)
	at org.keycloak.models.map.client.MapClientProvider.lambda$removeClients$4(MapClientProvider.java:185)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.keycloak.models.map.client.MapClientProvider.removeClients(MapClientProvider.java:185)
	at org.keycloak.storage.ClientStorageManager.removeClients(ClientStorageManager.java:247)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.removeClients(RealmCacheSession.java:578)
	at org.keycloak.models.jpa.JpaRealmProvider.removeRealm(JpaRealmProvider.java:175)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.removeRealm(RealmCacheSession.java:495)
	at org.keycloak.services.managers.RealmManager.removeRealm(RealmManager.java:257)
	at org.keycloak.services.resources.admin.RealmAdminResource.deleteRealm(RealmAdminResource.java:473)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

All other tests from the same class run fine and have the test realm properly removed at the end. This is the only test that is currently failing, so a bit more investigation is needed to find out why this fails with this new transaction arrangement.

@sguilhen sguilhen force-pushed the 11230-rework-jpa-tx branch from eaf6bab to 861fce3 Compare April 14, 2022 20:07
@sguilhen
Copy link
Copy Markdown
Contributor Author

sguilhen commented Apr 18, 2022

Testsuite update:

I did previously run the testsuite with all JPA Map storages available in main - this means that the realms JPA Map storage was not enabled as that is still to be merged. As realms are arguably the main component of Keycloak, I've created a branch that includes the realms PR along with the changes here to be able to test the new tx arrangement with realms as well.

First I had to change things a little bit because I was hitting a StackOverflowException:

at org.keycloak.services.DefaultKeycloakSession.getComponentProvider(DefaultKeycloakSession.java:362)
at org.keycloak.models.utils.KeycloakModelUtils.getComponentProvider(KeycloakModelUtils.java:313)
at org.keycloak.models.map.common.AbstractMapProviderFactory.getStorage(AbstractMapProviderFactory.java:61)
at org.keycloak.models.map.realm.MapRealmProviderFactory.create(MapRealmProviderFactory.java:33)
at org.keycloak.models.map.realm.MapRealmProviderFactory.create(MapRealmProviderFactory.java:25)
at org.keycloak.services.DefaultKeycloakSession.getProvider(DefaultKeycloakSession.java:316)
at org.keycloak.services.DefaultKeycloakSession.realmLocalStorage(DefaultKeycloakSession.java:228)
at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealmDelegate(RealmCacheSession.java:149)
at org.keycloak.models.cache.infinispan.RealmAdapter.isUpdated(RealmAdapter.java:89)
at org.keycloak.models.cache.infinispan.RealmAdapter.getId(RealmAdapter.java:97)
at org.keycloak.services.DefaultKeycloakSession.getComponentProvider(DefaultKeycloakSession.java:362)
at org.keycloak.models.utils.KeycloakModelUtils.getComponentProvider(KeycloakModelUtils.java:313)
at org.keycloak.models.map.common.AbstractMapProviderFactory.getStorage(AbstractMapProviderFactory.java:61)
at org.keycloak.models.map.realm.MapRealmProviderFactory.create(MapRealmProviderFactory.java:33)
at org.keycloak.models.map.realm.MapRealmProviderFactory.create(MapRealmProviderFactory.java:25)
at org.keycloak.services.DefaultKeycloakSession.getProvider(DefaultKeycloakSession.java:316)

Basically the code was trying to load the MapRealmProvider, which in turn calls the getStorage(), which now calls the session to obtain the component provider, and then, using the the realm found in the session context, it calls getId, which triggers a request to load the realm provider again, leading to the infinite loop.

I had to create a new version of getComponentProvider in the session that doesn't use the realm from the context but rather the realm specified as a parameter. I still don't quite understand how a realm was put in the context and its provider wasn't found in the session cache because you need a provider to get a realm in the first place, so it should be cached. If it was cached, the code would not try to create a new MapRealmProvider again. But I was able to work around it and I this allowed me to proceed with testing.

So the results are as follows:

  • OIDCClientRegistrationTest#testClientWithScope fails as described above.
  • ExportImportTest and FederatedStorageExportImportTest both fail in a very similar manner, but this time hibernate complains about a realm transient instance:
org.hibernate.TransientPropertyValueException: object references an unsaved transient instance - save the transient instance before flushing : org.keycloak.models.map.storage.jpa.realm.entity.JpaRealmAttributeEntity.root -> org.keycloak.models.map.storage.jpa.realm.entity.JpaRealmEntity
	at org.hibernate.engine.spi.CascadingActions$8.noCascade(CascadingActions.java:379)
	at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:165)
	at org.hibernate.event.internal.AbstractFlushingEventListener.cascadeOnFlush(AbstractFlushingEventListener.java:159)
	at org.hibernate.event.internal.AbstractFlushingEventListener.prepareEntityFlushes(AbstractFlushingEventListener.java:150)
	at org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:83)
	at org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:46)
	at org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1433)
	at org.hibernate.internal.SessionImpl.scroll(SessionImpl.java:1654)
	at org.hibernate.query.internal.AbstractProducedQuery.doScroll(AbstractProducedQuery.java:1537)
	at org.hibernate.query.internal.AbstractProducedQuery.scroll(AbstractProducedQuery.java:1523)
	at org.hibernate.query.internal.AbstractProducedQuery.stream(AbstractProducedQuery.java:1547)
	at org.hibernate.query.criteria.internal.compile.CriteriaQueryTypeQueryAdapter.stream(CriteriaQueryTypeQueryAdapter.java:89)
	at org.hibernate.query.Query.getResultStream(Query.java:1107)
	at org.keycloak.models.map.storage.jpa.JpaMapKeycloakTransaction.read(JpaMapKeycloakTransaction.java:116)
	at org.keycloak.models.map.role.MapRoleProvider.getClientRolesStream(MapRoleProvider.java:155)
	at org.keycloak.storage.RoleStorageManager.getClientRolesStream(RoleStorageManager.java:217)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.getClientRolesStream(RealmCacheSession.java:669)
	at org.keycloak.models.cache.infinispan.ClientAdapter.getRolesStream(ClientAdapter.java:585)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.removeClient(RealmCacheSession.java:593)
	at org.keycloak.services.managers.RealmManager.removeRealm(RealmManager.java:260)
	at org.keycloak.services.resources.admin.RealmAdminResource.deleteRealm(RealmAdminResource.java:473)
  • ClientPoliciesTest fails when running the whole testsuite, but passes when executed alone. I've noticed that this test fails only when ClientPoliciesImportExportTest has run first. I'm getting a 500 when it tries to remove the realm but couldn't see the undertow logs. I'm guessing it is a very similar issue but still needs confirmation.

@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 19, 2022

@sguilhen - as a workaround until the reason for the transient instance has been identified, please try the following workaround.

It allowed me to run mvn clean install -Dkeycloak.profile.feature.map_storage=enabled -Dtest=ClientPoliciesImportExportTest,ClientPoliciesTest in your test-branch without an error.

    // in: JpaMapKeycloakTransaction.java
    public boolean delete(String key) {
        if (key == null) return false;
        UUID uuid = UUIDKey.INSTANCE.fromStringSafe(key);
        if (uuid == null) return false;
        em.remove(em.getReference(entityType, uuid));
        logger.tracef("tx %d: delete entity %s", hashCode(), key);
        // START WORKAROUND
        em.flush();
        em.clear();
        // END WORKAROUND
        return true;
    }

I'll continue to analyze the reason. Another workaround would be to add cascade to @ManyToOne(fetch = FetchType.LAZY) in the JpaAttributeEntity, but I read that this shouldn't be done and the relation should be managed from the other side as it is now.

@sguilhen
Copy link
Copy Markdown
Contributor Author

sguilhen commented Apr 19, 2022

@ahus1 This does indeed fix the tests I've listed, but introduces regressions in other tests, such as SAMLFilterServletAdapterTest, that fails with:

org.hibernate.LazyInitializationException: Unable to perform requested lazy initialization [org.keycloak.models.map.storage.jpa.realm.entity.JpaRealmEntity.attributes] - no session and settings disallow loading outside the Session
	at org.hibernate.bytecode.enhance.spi.interceptor.EnhancementHelper.throwLazyInitializationException(EnhancementHelper.java:196)
	at org.hibernate.bytecode.enhance.spi.interceptor.EnhancementHelper.performWork(EnhancementHelper.java:86)
	at org.hibernate.bytecode.enhance.spi.interceptor.LazyAttributeLoadingInterceptor.loadAttribute(LazyAttributeLoadingInterceptor.java:76)
	at org.hibernate.bytecode.enhance.spi.interceptor.LazyAttributeLoadingInterceptor.fetchAttribute(LazyAttributeLoadingInterceptor.java:72)
	at org.hibernate.bytecode.enhance.spi.interceptor.LazyAttributeLoadingInterceptor.handleRead(LazyAttributeLoadingInterceptor.java:53)
	at org.hibernate.bytecode.enhance.spi.interceptor.AbstractInterceptor.readObject(AbstractInterceptor.java:153)
	at org.keycloak.models.map.storage.jpa.realm.entity.JpaRealmEntity.$$_hibernate_read_attributes(JpaRealmEntity.java)
	at org.keycloak.models.map.storage.jpa.realm.entity.JpaRealmEntity.getAttribute(JpaRealmEntity.java:993)
	at org.keycloak.models.map.realm.MapRealmAdapter.getAttribute(MapRealmAdapter.java:217)
	at org.keycloak.models.cache.infinispan.RealmAdapter.getAttribute(RealmAdapter.java:1672)
	at org.keycloak.url.DefaultHostnameProvider.getRealmUri(DefaultHostnameProvider.java:99)
	at org.keycloak.url.DefaultHostnameProvider.resolveUri(DefaultHostnameProvider.java:57)
	at org.keycloak.url.DefaultHostnameProvider.getScheme(DefaultHostnameProvider.java:38)
	at org.keycloak.models.KeycloakUriInfo.<init>(KeycloakUriInfo.java:46)
	at org.keycloak.services.DefaultKeycloakContext.getUri(DefaultKeycloakContext.java:78)
	at org.keycloak.services.DefaultKeycloakContext.getUri(DefaultKeycloakContext.java:85)
	at org.keycloak.services.resources.admin.GroupResource.deleteGroup(GroupResource.java:136)

This all happens when trying to remove a group, and then several of tests fail after this one becaue the group wasn't properly removed from the DB.

So I think we need to find a different way, unfortunately.

@sguilhen sguilhen force-pushed the 11230-rework-jpa-tx branch from 861fce3 to 523f4ce Compare April 19, 2022 17:48
@sguilhen
Copy link
Copy Markdown
Contributor Author

  • AbstractMapStorageFactory.getStorage now uses the session.getComponentProvider method to obtain a reference to the MapStorageProvider, benefiting from the caching performed by the session and ensuring the same provider instance will be returned for a particular componentId for the duration of the session. This is important to ensure that all area-specific providers obtain a transaction object that uses the same entity manager that was used to create the JPA transaction and thus participate in the right transaction.

Due to the StackOverflowException when we enable JPA Map provider for realms, and the fact we can't at the moment be changing the KeycloakSession interface, this had to be reverted. Now the JpaMapStorageProviderFactory caches the JpaMapStorageProvider in the session instead of going through session.getComponentProvider. The effect is the same: for a particular session only a single storage provider is created.

@sguilhen sguilhen force-pushed the 11230-rework-jpa-tx branch from 523f4ce to 86e5b23 Compare April 19, 2022 18:00
@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 19, 2022

I continued my analysis today. I manged to fix two issues, still some problems remain (e5bb5f6fcfe4d6c36c9b29dae08a70afa78fce3a):

  • ClientPoliciesImpmortExportTest didn't reset some system properties, leading to errors in the following tests - @sguilhen and me paired to find this.
  • Missing unique constraints on attributes to ensure that a name/value pair can appear only once per parent was hiding a problem that sometimes Hibernates adds two entries instead of overwriting a previous entry when the first entry was created in the current transaction. Adding a constraint makes this easier to test, still the reason is still unknown why this happens. Several variations of equal/hashcode and List vs. Set didn't make a change. @sguilhen found a test that always fails, so we'll hopefully narrow this down in the next days (Configuration: OIDCClientRegistrationTest.testClientWithScope).

More debugging ahead...

@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 20, 2022

Debugging continued until the results improved. See aece8e1.

Key changes compared to the previous commit:

  • No longer changing JpaAttributeEntity as this turned out to have no effect either way
  • Spotted the problem in "auto-flush" mode of Hibernate: in DefaultAutoFlushEventListener, when Hibernate evaluates if a flush is necessary, it might queue the inserts in the action queue. If it then determines that a flush is not necessary, the insert will still be in the queue. If such an element is later orphaned within the same transaction while it hasn't been flushed, Hibernate will execute the insert at the end of the transaction, leading to an orphan in the database. There are two possible solutions to that
    1. small: switch to FlushMode.ALWAYS in JpaMapStorageProviderFactory
    2. bigger: replace the DefaultAutoFlushEventListener with a modified JpaAutoFlushListener that will always flush when there is a queued insert independent of the query spaces

Let's discuss later today how to split these changes to different PRs or if you want to have separate commits in this PR for this.

@sguilhen
Copy link
Copy Markdown
Contributor Author

Debugging continued until the results improved. See aece8e1.

Key changes compared to the previous commit:

  • No longer changing JpaAttributeEntity as this turned out to have no effect either way

  • Spotted the problem in "auto-flush" mode of Hibernate: in DefaultAutoFlushEventListener, when Hibernate evaluates if a flush is necessary, it might queue the inserts in the action queue. If it then determines that a flush is not necessary, the insert will still be in the queue. If such an element is later orphaned within the same transaction while it hasn't been flushed, Hibernate will execute the insert at the end of the transaction, leading to an orphan in the database. There are two possible solutions to that

    1. small: switch to FlushMode.ALWAYS in JpaMapStorageProviderFactory
    2. bigger: replace the DefaultAutoFlushEventListener with a modified JpaAutoFlushListener that will always flush when there is a queued insert independent of the query spaces

Let's discuss later today how to split these changes to different PRs or if you want to have separate commits in this PR for this.

Ok, so as we've discussed today I've tried both approaches and # 1 falls short as some tests started to hang. I've then switched to # 2 and included your JpaAutoFlushListener. This allowed me to run the entire testsuite with no errors (except for the known failures when JPA Map storage is used for groups - see #11369)

@vramik
Copy link
Copy Markdown
Contributor

vramik commented Apr 21, 2022

This allowed me to run the entire testsuite with no errors (except for the known failures when JPA Map storage is used for groups - see #11369)

just FYI, there is PR to address the unrelated #11369: #11563

@hmlnarik
Copy link
Copy Markdown
Contributor

I agree with #2 - the JpaAutoFlushListener

@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 21, 2022

@sguilhen - to elaborate on Hynek's comment: we discussed it in today's team meeting, and agreed that the new flush listener is the way to go, also as the smaller solution didn't work.

With the LDAP test already addressed, I assume that after a rebase and with the new flush listener the tests should pass, and this PR is then ready for review & merge.

@sguilhen
Copy link
Copy Markdown
Contributor Author

@ahus1 Thanks! I will incorporate the JpaAutoFlushListener in this PR and add you as a co-author.

@sguilhen sguilhen force-pushed the 11230-rework-jpa-tx branch from 86e5b23 to 456b587 Compare April 21, 2022 15:16
@sguilhen sguilhen marked this pull request as ready for review April 21, 2022 15:16
@sguilhen sguilhen requested review from ahus1 and hmlnarik April 21, 2022 15:16
ahus1
ahus1 previously approved these changes Apr 21, 2022
Copy link
Copy Markdown
Member

@ahus1 ahus1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for preparing this PR, looks good to me!

@sguilhen sguilhen changed the title WIP - Enlist JPA transaction in JpaMapStorageProvider.getStorage Enlist JPA transaction in JpaMapStorageProvider.getStorage Apr 21, 2022
@sguilhen
Copy link
Copy Markdown
Contributor Author

Important note: with this change Keycloak doesn't start anymore with CockroachDB:

Exception in thread "main" java.lang.RuntimeException: RESTEASY003325: Failed to construct public org.keycloak.services.resources.KeycloakApplication()
	at org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:164)
	at org.jboss.resteasy.spi.ResteasyProviderFactory.createProviderInstance(ResteasyProviderFactory.java:2835)
	at org.jboss.resteasy.spi.ResteasyDeployment.createApplication(ResteasyDeployment.java:376)
	at org.jboss.resteasy.spi.ResteasyDeployment.startInternal(ResteasyDeployment.java:288)
	at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:98)
	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:140)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.init(HttpServletDispatcher.java:42)
	at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)
	at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:305)
	at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:145)
	at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:588)
	at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:559)
	at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)
	at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)
	at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:601)
	at org.jboss.resteasy.plugins.server.undertow.UndertowJaxrsServer.deploy(UndertowJaxrsServer.java:269)
	at org.keycloak.testsuite.KeycloakServer.start(KeycloakServer.java:446)
	at org.keycloak.testsuite.KeycloakServer.bootstrapKeycloakServer(KeycloakServer.java:251)
	at org.keycloak.testsuite.KeycloakServer.main(KeycloakServer.java:154)
Caused by: java.lang.RuntimeException: org.hibernate.exception.SQLGrammarException: could not extract ResultSet
	at org.keycloak.testsuite.TestPlatform.exit(TestPlatform.java:44)
	at org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:120)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:152)
	... 18 more
Caused by: org.hibernate.exception.SQLGrammarException: could not extract ResultSet
	at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:106)
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
	at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:69)
	at org.hibernate.loader.Loader.getResultSet(Loader.java:2265)
	at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2028)
	at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1990)
	at org.hibernate.loader.Loader.scroll(Loader.java:2863)
	at org.hibernate.loader.hql.QueryLoader.scroll(QueryLoader.java:574)
	at org.hibernate.hql.internal.ast.QueryTranslatorImpl.scroll(QueryTranslatorImpl.java:447)
	at org.hibernate.engine.query.spi.HQLQueryPlan.performScroll(HQLQueryPlan.java:354)
	at org.hibernate.internal.SessionImpl.scroll(SessionImpl.java:1658)
	at org.hibernate.query.internal.AbstractProducedQuery.doScroll(AbstractProducedQuery.java:1537)
	at org.hibernate.query.internal.AbstractProducedQuery.scroll(AbstractProducedQuery.java:1523)
	at org.hibernate.query.internal.AbstractProducedQuery.stream(AbstractProducedQuery.java:1547)
	at org.hibernate.query.criteria.internal.compile.CriteriaQueryTypeQueryAdapter.stream(CriteriaQueryTypeQueryAdapter.java:89)
	at org.hibernate.query.Query.getResultStream(Query.java:1107)
	at org.keycloak.models.map.storage.jpa.JpaMapKeycloakTransaction.read(JpaMapKeycloakTransaction.java:116)
	at org.keycloak.models.map.role.MapRoleProvider.getRealmRole(MapRoleProvider.java:207)
	at org.keycloak.models.map.role.MapRoleProvider.addRealmRole(MapRoleProvider.java:63)
	at org.keycloak.storage.RoleStorageManager.addRealmRole(RoleStorageManager.java:122)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.addRealmRole(RealmCacheSession.java:624)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.addRealmRole(RealmCacheSession.java:619)
	at org.keycloak.models.map.realm.MapRealmAdapter.addRole(MapRealmAdapter.java:1510)
	at org.keycloak.models.utils.KeycloakModelUtils.setupDefaultRole(KeycloakModelUtils.java:449)
	at org.keycloak.services.managers.RealmManager.createRealm(RealmManager.java:113)
	at org.keycloak.services.managers.RealmManager.createRealm(RealmManager.java:96)
	at org.keycloak.services.managers.ApplianceBootstrap.createMasterRealm(ApplianceBootstrap.java:67)
	at org.keycloak.services.resources.KeycloakApplication$3.run(KeycloakApplication.java:203)
	at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:242)
	at org.keycloak.services.resources.KeycloakApplication.bootstrap(KeycloakApplication.java:174)
	at org.keycloak.services.resources.KeycloakApplication$1.run(KeycloakApplication.java:138)
	at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:242)
	at org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:130)
	at org.keycloak.testsuite.TestPlatform.onStartup(TestPlatform.java:35)
	at org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:116)
	... 23 more
Caused by: org.postgresql.util.PSQLException: ERROR: relation "kc_role" does not exist
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
	at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:60)
	... 55 more

@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 22, 2022

@sguilhen - IMHO that shouldn't keep us from merging this. I'd be happy to tackle that in a separate ticket after this has been merged.

Copy link
Copy Markdown
Contributor

@hmlnarik hmlnarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sguilhen Could you please rebase?

With this change Keycloak doesn't start anymore with CockroachDB

What is the reason for this?

Comment on lines 55 to 57
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be under synchronized guard, like e.g. here?

if (emf == null) {
synchronized (this) {
if (emf == null) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one. I think this boils down to the question: can getStorage be called concurrently within the same session?

@sguilhen
Copy link
Copy Markdown
Contributor Author

@sguilhen Could you please rebase?

With this change Keycloak doesn't start anymore with CockroachDB

What is the reason for this?

No idea, to be honest. I can see the table is created by Liquibase and right after that we get the error.

@sguilhen
Copy link
Copy Markdown
Contributor Author

@hmlnarik So @ahus1 and I were investigating a bit and we think this is what is going on with CRDB:

  1. Server starts for the first time, master realm creation is triggered. This loads the MapRealmProvider, which loads the storage. Before anything happens the JpaMapStorageProviderFactory updates the DB by running the realm changelog.
  2. Once the storage is loaded via getStorage, the JPA transaction is created and enlisted.
  3. As part of master realm creation, roles are added. This, again, loads the MapRoleProvider, which also loads the storage for roles. Again, the storage factory updates the DB schema by running the roles changelog.
  4. When the MapRoleProvider is created, it ends up with a tx that is using the same JPA transaction enlisted before.

Now, what seems to be happening is this: once the JPA tx is started as part of the MapRealmProvider initialization, CRDB might be "snapshooting" the DB schema, that at that point only contains realm tables. When later on an operation on a different table (kc_role) is performed in the same tx, it raises the exception saying it can't locate the table. PostgreSQL seems to be able to refresh its view of the DB schema and is aware of the new tables that were created after that initial tx has started.

There seems to be some further evidence that this is happening: If I re-start the server after the first failure, CRDB no longer complains about the kc_role table, after all it has been created during the first attempt to start. It then fails when adding clients to the master realm, with the same message: relation kc_client does not exist, but the tables have been created by Liquibase. Re-starting again moves the starting process a little further, until it happens with kc_client_scope. After a few re-starts I was able to start the server and log into the admin console. Once there, as I clicked on Groups, same message appeared in the log, complaining about kc_group not being found. Clicking on it again worked as the table was now created.

So the bottom line is that for CRDB to work with these "longer" JPA transactions, the schema needs to be processed beforehand. Perhaps have some code that examines the config to determine which areas are using the JPA Map storage and pre-process all of those. Or require people to run the Liquibase changelogs manually before running Keycloak, but I'm not too keen on this one.

@sguilhen
Copy link
Copy Markdown
Contributor Author

@hmlnarik
Copy link
Copy Markdown
Contributor

So the bottom line is that for CRDB to work with these "longer" JPA transactions, the schema needs to be processed beforehand. Perhaps have some code that examines the config to determine which areas are using the JPA Map storage and pre-process all of those.

Thank you for the analysis, it sounds likely. Would it be then possible to introduce a changelog aggregating all changelog files (jpa-*-changelog.xml files) and running that in case it is cockroachdb, or perhaps unconditionally for the time being? It is a step back but at the same time the only practical solution that I can think of now.

@sguilhen
Copy link
Copy Markdown
Contributor Author

Thank you for the analysis, it sounds likely. Would it be then possible to introduce a changelog aggregating all changelog files (jpa-*-changelog.xml files) and running that in case it is cockroachdb, or perhaps unconditionally for the time being? It is a step back but at the same time the only practical solution that I can think of now.

Yep, I think that will be the way to workaround this for the moment. I will look into it.

@sguilhen sguilhen force-pushed the 11230-rework-jpa-tx branch from 5e97629 to 65b9c41 Compare April 22, 2022 19:19
@sguilhen
Copy link
Copy Markdown
Contributor Author

@hmlnarik @ahus1 Updated the PR, so now the JpaMapLiquibaseUpdaterProvider uses an aggregate changelog if the database is CockroachDB. I've tested and I was able to start Keycloak with CRDB. For PostgreSQL the processing remains the same - we update schemas on the fly.

Only downside of this approach is that we can end up creating tables that won't ever be used if some area is not configured to use the JPA Map storage, but I think it is the easiest solution for the moment. We can re-evaluate if we can do something smarter in the future.

Copy link
Copy Markdown
Contributor

@hmlnarik hmlnarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sguilhen and @ahus1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage Indicates an issue that touches storage (change in data layout or data manipulation)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jpa Map Storage: break 1:1 relationship between JPA entity transactions and Jpa*MapKeycloakTransactions

4 participants