Skip to content

Add Shared Signals Framework Transmitter capability#48256

Draft
thomasdarimont wants to merge 140 commits intokeycloak:mainfrom
thomasdarimont:issue/gh-48254-ssf-tx-support-v1
Draft

Add Shared Signals Framework Transmitter capability#48256
thomasdarimont wants to merge 140 commits intokeycloak:mainfrom
thomasdarimont:issue/gh-48254-ssf-tx-support-v1

Conversation

@thomasdarimont
Copy link
Copy Markdown
Contributor

@thomasdarimont thomasdarimont commented Apr 20, 2026

Adds Shared Signals Framework support to Keycloak in the SSF Transmitter role: Keycloak signs Security Event Tokens (SETs, RFC 8417) describing realm/user/session/credential events and delivers them to OAuth clients
registered as SSF Receivers, either by HTTP PUSH (RFC 8935) or HTTP POLL (RFC 8936).

Targets the OpenID Shared Signals Framework 1.0 (Final) specification plus the CAEP Interoperability Profile 1.0. Ships the legacy SSE CAEP profile alongside for Apple Business Manager / Apple School Manager interop, since Apple device-fleet enrolment is a concrete drive-use case.

Gated behind Profile.Feature.SSF experimental, opt-in.

Background

Issue #43614 originally proposed SSF Receiver support (Keycloak ingesting SETs from upstream IdPs / risk engines). After exploring both sides, we're shipping the Transmitter first (see #48254) because it covers the strongest community asks (federate Keycloak events to downstream SaaS, Apple device fleet revoke flow) and lets us validate the SSF data-plane against real receivers before designing the harder "action mapping" question on the Receiver side. Receiver support remains on the roadmap and is tracked separately via #43614.

Scope (experimental)

In:

  • SSF 1.0 stream management (CRUD, status, verification, subjects)
  • SET delivery via HTTP PUSH (RFC 8935) and HTTP POLL (RFC 8936) — POLL in a return-immediately form
  • Durable JPA outbox (SSF_PENDING_EVENT) with cluster-aware drainer and exponential backoff
  • CAEP credential-change / session-revoked / device-compliance-change event mapping from native Keycloak events
  • Subject selection (per-user / per-org ssf.notify.<clientId> attribute, default_subjects policy)
  • Synthetic event emit endpoint for non-Keycloak-native event sources
  • Per-receiver "manual-only events" gate to suppress auto-emit per event type per receiver
  • Legacy SSE CAEP profile for Apple Business / School Manager interop
  • Per-realm SSF admin REST + admin-console SSF tab on SSF-enabled clients (Receiver / Stream / Subjects / Pending Events sub-tabs)
  • Prometheus metrics binder (dispatcher, drainer, poll, verification, outbox depth, drainer-tick last-at)

Out (tracked as separate follow-up issues):

  • SSF Receiver role for Keycloak (ingestion of SETs)
  • POLL long-polling (returnImmediately=false honoured)
  • Dedicated SSF signing key (separate from realm OIDC signing key)
  • Chunked HELD release for very large backlogs
  • Performance characterization + security review
  • Formal interop matrix (caep.dev, ABM)

Tasks

  • All code gated behind Profile.Feature.SSF (experimental, off by default)
  • Per-realm ssf.transmitterEnabled toggle; per-client ssf.enabled toggle
  • SSF event listener registered as global (not user-toggleable per realm)
  • Receiver-facing endpoints conformant with SSF 1.0 8.1
  • CAEP credential-change / session-revoked / device-compliance mapping pass interop testing against caep.dev
  • SSE CAEP profile narrowed shape works with Apple Business Manager
  • Integration test coverage for the dispatch / outbox / push / poll pipeline (100+ tests)
  • Prometheus metrics exposed under keycloak_ssf_*
  • Design notes published

Documentation

A more detailed description for the design Design + behaviour can be found here: (Design Document)

Fixes #48254

Signed-off-by: Thomas Darimont [email protected]

This PR was partially co-authored with Claude AI

@thomasdarimont thomasdarimont added team/core-iam kind/feature Categorizes a PR related to a new feature labels Apr 20, 2026
@thomasdarimont thomasdarimont force-pushed the issue/gh-48254-ssf-tx-support-v1 branch from 6b46c3e to 279353a Compare April 20, 2026 07:53
@thomasdarimont
Copy link
Copy Markdown
Contributor Author

thomasdarimont commented Apr 20, 2026

@pedroigor @sguilhen as discussed here is the initial PR with a fully featured SSF Transmitter implementation without the SSF Receiver support.

The SSF feature is now aligned with the structure of the SCIM feature as multiple sub modules:
image

@thomasdarimont
Copy link
Copy Markdown
Contributor Author

thomasdarimont commented Apr 20, 2026

Admin UI Screenshots

If the SSF feature is enabled for the server the SSF Transmitter capability can be enabled for the realm.
If enabled a link to the ssf-configuration metadata is added.
image

If the SSF feature is enabled for the server and for the current realm, the SSF Receiver capability can be enabled on an authenticated client.
image

The SSF tab of a Client shows details about the SSF Receiver configuration.
image

The Stream tab shows the current SSF stream configuration:
image

The Subjects tab allows to manage interested subjects for a stream.
image

The Pending Events tab allows to syntetically emit events for a user (in the context of this SSF Receiver) and allows to lookup the state of the event processing for an event.
image

Example event lookup result:
image

Successfully delivered event:
image

@keycloak-github-bot
Copy link
Copy Markdown

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.model.singleUseObject.SingleUseObjectModelTest#testCluster

Keycloak CI - Store Model Tests

java.lang.AssertionError: 
threads didn't terminate in time: [main (RUNNABLE):
	at [email protected]/sun.management.ThreadImpl.dumpThreads0(Native Method)
	at [email protected]/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:505)
	at [email protected]/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:493)
...

Report flaky test

Copy link
Copy Markdown

@keycloak-github-bot keycloak-github-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@thomasdarimont thomasdarimont force-pushed the issue/gh-48254-ssf-tx-support-v1 branch 6 times, most recently from 9be57a4 to df16cab Compare April 23, 2026 17:10
Copy link
Copy Markdown

@keycloak-github-bot keycloak-github-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

thomasdarimont and others added 29 commits April 23, 2026 22:23
Free-form operator-facing notes describing the SSF receiver — what
downstream system it represents, who owns it, what events it consumes.
Stored as the ssf.description client attribute (255 char max, matching
the standard client description field). Placed directly above the
Audience field on the Receiver sub-tab.

UI-only — never surfaced on the receiver-facing wire and not consumed
by the dispatcher / metadata document. A getter on StreamConfig can be
added later if a need to expose it appears.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
The previous wording said "If empty the client ID is used as the
audience" — that was wrong. The fallback in StreamService#createAudience
generates a clientId/streamId pair, not just the clientId, so a receiver
that re-registers a stream gets a fresh audience instead of colliding
with the prior stream's SETs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
New per-receiver client attribute that lists event aliases the native
event listener must NOT auto-emit. The events stay in the receiver's
supportedEvents (so a receiver can still accept them on the wire and
the synthetic-emit endpoint can still fire them) but Keycloak's
automatic mapping skips them.

Use case: an SSF receiver representing an Apple School Manager device
fleet supports CaepSessionRevoked but should not get a session-revoke
on every Keycloak app logout (which would force every kid back to the
login screen on their iPad). With CaepSessionRevoked listed in
manualOnlyEvents, the receiver only gets the event when an explicit
upstream signal fires it through the synthetic-emit endpoint.

- ClientStreamStore.SSF_MANUAL_ONLY_EVENTS_KEY = "ssf.manualOnlyEvents"
  (comma-separated alias list, same shape as ssf.supportedEvents).
- StreamConfig.manualOnlyEvents stores resolved canonical event-type
  URIs; ClientStreamStore resolves aliases through the registry on read.
- SsfTransmitterEventListener.isManualOnlyForStream gates the listener
  generate path. Synthetic emit (EventEmitterService) deliberately does
  not consult this set.
- Admin UI: new typeahead-multi on the Receiver sub-tab populated from
  the live value of supportedEvents (so removing a supported event
  removes it from the manual-only options too). Disabled when no
  supported events are selected. Added i18n keys.

Tests:
- Unit: SsfTransmitterEventListenerTest covers null/empty set, match,
  non-match, multi-event token, no-events token (6 cases).
- Integration: SsfTransmitterManualOnlyEventsTests covers (a) LOGOUT
  produces no push, (b) synthetic emit still delivers the same event,
  (c) credential-change (not in manual-only) still auto-emits.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
Three small changes that reduce the boilerplate (and the foot-guns) for
extensions that ship their own SSF event types:

- SsfEventProviderFactory.create(KeycloakSession) gets a default that
  returns null. Contribution-only factories — the common case for
  custom event types — collapse to getId() + isSupported() + the
  getContributedEventFactories() map; they no longer have to implement
  a meaningless create().

- SsfTransmitter gains two helpers:
    * isReceiverClient(client) — null-safe predicate.
    * getReceiverClient(session, clientClientId) — one-liner lookup
      that throws SsfException with a clear message when the clientId
      is unknown OR resolves to a client without ssf.enabled=true.
  Centralises the "is this an SSF Receiver" check so REST callers,
  programmatic callers, and tests share one definition.

- EventEmitterService.emit now refuses non-SSF clients up front with
  SsfException (delegating the predicate to SsfTransmitter.isReceiverClient).
  Previously a wrong client surfaced as a confusing STREAM_NOT_FOUND
  once the stream lookup failed; the new error names the actual
  configuration mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
New label-free gauge holding the epoch-second of the most recent
drainer tick attempt. Stamped from recordDrainerTick on both ok and
error outcomes — only a *stuck* tick that never returns lets the gauge
fall behind. Reports 0 until the first observed tick so a freshly-
started server doesn't read as instantly stalled.

Operators get an absolute "how long ago" stall query that complements
the existing counter-rate-based check:

    time() - keycloak_ssf_drainer_tick_last_at_seconds > 120

vs. the trend-based:

    rate(keycloak_ssf_drainer_tick_total[5m]) == 0

Both are valid; the gauge is cheaper to alert on at scale (single
series per process, no rate window).

Uses Time.currentTime() for consistency with the rest of the SSF
codebase. NOOP path untouched — the gauge isn't registered and the
field is never stamped.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
Two related changes that tighten the wire shape of synthetic-emitted
SETs and the native CAEP credential-change emission.

Per-event validate() hook
- New SsfEvent.validate() default no-op called by EventEmitterService
  after Jackson deserialisation. Subclasses with spec-required fields
  override it to throw SsfEventValidationException. Native event
  production never throws — Keycloak event details always supply the
  required pieces — so the hook only matters on the synthetic-emit
  path.
- SsfEventValidationException carries a stable MESSAGE_KEY
  ("invalid_event_data") plus structured eventAlias + field
  (the @JsonProperty wire-name, not the Java field name) so callers
  can compose a localised message from the pieces. Wire-side, the
  emit response uses the matching EmitEventStatus.INVALID_EVENT_DATA
  with the same key — one identifier names both the failure category
  and the offending alias.field.
- Coverage:
    * CaepCredentialChange — credential_type, change_type
    * CaepDeviceComplianceChange — current_status, previous_status
    * CaepAssuranceLevelChange — namespace, current_level
    * CaepRiskLevelChanged — principal, current_level
    * CaepTokenClaimsChanged — claims (non-empty)
    * RiscCredentialCompromise — credential_type
    * Other CAEP/RISC events keep the default no-op (their bodies are
      either signal-by-event-type-only or every field is spec-optional).
- Drive-by: added missing currentLevel / previousLevel accessors on
  CaepAssuranceLevelChange (Jackson reaches the protected field via
  @JsonProperty, but Java callers + tests need the setters).

CAEP credential_type translation
- SecurityEventTokenMapper.narrowCaepCredentialType now maps
  Keycloak's internal credential type strings to the CAEP spec's
  enumerated values: password → "password", otp → "app",
  webauthn (2FA) → "fido2-roaming", webauthn-passwordless →
  "fido2-platform". Unknown types pass through verbatim per CAEP's
  "any other credential type supported mutually" escape hatch.
- Push-delivery integration test updated for the otp → "app" mapping.

21 unit-test cases cover the new validators + the exception structure
+ the default-no-op contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
Fan out per-stream conversion was building a SET token for every SSF
receiver before the dispatcher's subject filter could reject it — so
multi-receiver realms logged "Generated SSF Security Event Token" once
per stream even when only one would deliver.

- SubjectSubscriptionFilter: extract evaluateSubjectSubscription and
  add shouldDispatchForUser(user, stream, …) entry point.
- SecurityEventTokenDispatcher: expose shouldDispatchForUser(user, stream)
  so the listener can short-circuit before toSecurityEvent.
- SsfTransmitterEventListener: resolve event user once and skip streams
  that fail the pre-gate. Null user defers to the dispatcher-side gate
  (admin events, complex subjects, impersonation).
- Enrich "Generated …" debug logs with realm, clientId, streamId,
  userId, eventType (and operationType/resourceType/resourcePath for
  admin events).

Signed-off-by: Thomas Darimont <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tgresql in the CI truncate to ms upfront

Signed-off-by: Thomas Darimont <[email protected]>
- Client deletion orphaned outbox rows. SsfPendingEventStore.deleteByClient
existed but was only wired to explicit stream-delete. A direct client
delete (admin UI, REST, realm JSON re-import) left rows keyed to a dead
internal client UUID that the drainer would retry to dead-letter. Added
a ClientRemovedEvent branch to the factory's provider-event listener
that calls deleteByClient, extracted as the protected hook
purgeOutboxOnClientRemoved so subclasses can layer additional cleanup.

- Importing a client JSON that already carries ssf.streamId could produce
two clients in the same realm sharing the same streamId. findClientByStreamId
picked one with .findFirst(), making dispatch silently nondeterministic.
Added a ClientUpdatedEvent branch that runs validateImportedStreamId —
throws ModelDuplicateException when another client in the realm already
holds the attribute, so the offending import is rolled back with a
clear error instead of mutating SSF state silently. Delete-then-
reimport still works because the validator only fires when a *live*
collision exists.

- Hardened ClientStreamStore.findClientByStreamId to pull 2 rows and
return Optional.empty() (plus a warn log naming both clientIds) when
a collision slips through — defence in depth for out-of-band attribute
edits that bypass the event listener.

Renamed the listener field to ssfProviderEventListener to reflect the
broader scope (realm-removed / client-removed / client-updated).

Added ssf/tests/base integration tests covering the duplicate-rejection,
delete-then-reimport, and no-ssf-state happy paths end to end against a
real Keycloak session.

Signed-off-by: Thomas Darimont <[email protected]>
This avoids blocking the admin tx on client/realm removal.

Inline DELETE-by-clientId / DELETE-by-realmId in the ClientRemovedEvent /
RealmRemovedEvent listeners would serialize the entire outbox backlog into
the admin's removal transaction. A receiver with 100k+ queued rows could
push the transaction past its timeout and leave the admin staring at a
generic failure. Switched to fire-and-forget post-commit cleanup in
bounded batches.

- SsfPendingEventEntity: new findIdsByClient / findIdsByRealm /
deleteByIds named queries backing a portable SELECT-ids + DELETE-by-ids
batching pattern (JPQL has no DELETE ... LIMIT).
- SsfPendingEventStore.deleteBatchByClient / deleteBatchByRealm: return
the row count so the caller knows when to stop looping.
- SsfOutboxCleanupTask: Runnable that opens a fresh session per batch
via KeycloakModelUtils.runJobInTransaction, loops until drained or
maxBatches (default 10_000) × batchSize (default 1000) reached.
Mid-flight crashes leave orphan PUSH rows that the drainer's existing
missing-realm/client/stream fast-path dead-letters on the next tick;
dead-letter retention purges the rest.
- DefaultSsfTransmitterProviderFactory: listeners submit the task to
the ssf-outbox-cleanup ExecutorsProvider pool and return immediately.
ClientModel.ClientRemovedEvent is node-local, so only the originating
node schedules work — no cross-node coordination needed.
- Integration tests cover batch semantics (batch-size respect, status
coverage, realm isolation, input validation), the full-drain happy
path, the maxBatches safety cap, and the realm-scope variant.

Signed-off-by: Thomas Darimont <[email protected]>
Signed-off-by: Thomas Darimont <[email protected]>
- Recognize users/{id}/reset-password and users/{id}/credentials/{cid}
admin paths; emit CAEP credential-change (UPDATE/DELETE) with
initiating_entity=ADMIN.
- Thread AdminEvent through session-revoked and credential-change
generators so initiating entity reflects admin vs. user origin.
- Gate admin-event dispatch on ResourceType.USER and per-stream
shouldDispatchForUser to avoid building tokens that would be
filtered out.
- Add SsfUtil.userIdFromAdminEventPath helper for parsing
users/{id}/... resource paths.

Signed-off-by: Thomas Darimont <[email protected]>
Tighten the admin event resource-path patterns in SecurityEventTokenMapper
from (.*) to ([^/]+) so each capturing group matches a single path segment
instead of crossing '/' boundaries. The greedy .* form allowed O(N^2)
backtracking on adversarial input like "users//credentials/a/credentials/a/..."
(CodeQL js/polynomial-redos). Path segments are UUIDs that never contain
'/', so the tighter class is both safer and more correct.

Signed-off-by: Thomas Darimont <[email protected]>
Replaces the hard-coded 24h constant in SsfPushOutboxDrainerTask with
a new `outbox-delivered-retention` SPI property (default 24h, 0 to retain indefinitely),
mirroring the existing `outbox-dead-letter-retention` knob

Signed-off-by: Thomas Darimont <[email protected]>
…utboxDrainerTaskConfig

Signed-off-by: Thomas Darimont <[email protected]>
- Split Receivers tab into multiple sections
- Revise labels

Signed-off-by: Thomas Darimont <[email protected]>
@thomasdarimont thomasdarimont force-pushed the issue/gh-48254-ssf-tx-support-v1 branch from df16cab to 06f6463 Compare April 24, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flaky-test kind/feature Categorizes a PR related to a new feature team/core-iam

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SSF Transmitter Support

2 participants