Skip to content

fix: use distance measure for unknown options#48217

Merged
vmuzikar merged 2 commits intokeycloak:mainfrom
shawkins:iss48115
Apr 22, 2026
Merged

fix: use distance measure for unknown options#48217
vmuzikar merged 2 commits intokeycloak:mainfrom
shawkins:iss48115

Conversation

@shawkins
Copy link
Copy Markdown
Contributor

@shawkins shawkins commented Apr 17, 2026

After thinking more about the part matching it seems better to use a general notion of similarity. While it doesn't completely understand the wildcard convention it seems to be close enough based upon the prefix.

The change here will return the closest 7 options, rather than all options that start with same prefix.

This reuses the picocli cosine distance as I don't see another text distance utility in our code base.

However this is via reflection. The next best options would be to copy that or some code that implements levenshtein - rather than introducing a dependency on commons text.

closes: #48115

@shawkins shawkins force-pushed the iss48115 branch 2 times, most recently from 8737c31 to c62d865 Compare April 17, 2026 21:45
@shawkins shawkins marked this pull request as ready for review April 18, 2026 12:42
@shawkins shawkins requested review from a team as code owners April 18, 2026 12:42
@shawkins
Copy link
Copy Markdown
Contributor Author

Another option is to always perform the sorting.

@Pepo48
Copy link
Copy Markdown
Contributor

Pepo48 commented Apr 20, 2026

However this is via reflection. The next best options would be to copy that or some code that implements levenshtein - rather than introducing a dependency on commons text.

+1 for using our own method. I think relying on a private inner method obtained via reflection is a ticking bomb.

I looked at their code just out of curiosity (I happened to play with Levenshtein in the past, when I needed to parse out a specific sport event from a huge IPTV list 😄 - many team name variants) - they seem to use Bigram cosine similarity, which behaves differently as I searched and read through materials.

I did some primitive testing with KC options and I think we can stick with Bigram cosine similarity. I tried 8 scenarios - BCS won in 5, for the rest it was a tie. Levenshtein favours short strings, it ranked --db-url above --db-tls-trust-store-password even when I provided --db-pasword (with one s) as the input.

On top of that, our own impl can give us some advantage - more modern syntax (Picocli uses Java 5 style still) and more importantly better results. Picocli's CosineSimilarity.mostSimilar() uses TreeMap<Double, String>- when two options score identically, one is silently dropped. This turned out to be a real issue as I was losing completely relevant suggestions.

Copy link
Copy Markdown
Contributor

@Pepo48 Pepo48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the reference what I was playing with - hugely inspired by Picocli, the math is identical:

@shawkins
Copy link
Copy Markdown
Contributor Author

+1 for using our own method. I think relying on a private inner method obtained via reflection is a ticking bomb.

I wouldn't necessarily say bomb :) - if there were a problem, then we'd just return the full list as we do currently. We'd also catch this in testing. However it could slow down an upgrade eventually.

On top of that, our own impl can give us some advantage - more modern syntax (Picocli uses Java 5 style still) and more importantly better results. Picocli's CosineSimilarity.mostSimilar() uses TreeMap<Double, String>- when two options score identically, one is silently dropped. This turned out to be a real issue as I was losing completely relevant suggestions.

Thank you for looking at their code (which is currently used for command suggestions). That sounds like multiple upstream issues are needed then - one for using similarity on options, and another on not dropping things with the same score.

@shawkins
Copy link
Copy Markdown
Contributor Author

@Pepo48 I adapted your proposal to include our own cosine similarity to be more general - mostly replacing any usage of PicoCli's function. I'll still create upstream issues as well.

.errorText("Unknown option: '--db-pasword'").toString()));
assertThat(nonRunningPicocli.getErrString(), containsString(
"Possible solutions: --db-url, --db-connect-timeout, --db-url-host, --db-url-database, --db-url-port, --db-url-properties, --db-username, --db-password, --db-schema, --db-pool-initial-size, --db-pool-min-size, --db-pool-max-size, --db-pool-max-lifetime, --db-debug-jpql, --db-log-slow-queries-threshold, --db-tls-mode, --db-tls-trust-store-file, --db-tls-trust-store-type, --db-tls-trust-store-password, --db-mtls-key-store-file, --db-mtls-key-store-type, --db-mtls-key-store-password, --db-driver, --db, --db-url-full-<datasource>, --db-connect-timeout-<datasource>, --db-url-host-<datasource>, --db-url-database-<datasource>, --db-url-port-<datasource>, --db-url-properties-<datasource>, --db-username-<datasource>, --db-password-<datasource>, --db-schema-<datasource>, --db-pool-initial-size-<datasource>, --db-pool-min-size-<datasource>, --db-pool-max-size-<datasource>, --db-debug-jpql-<datasource>, --db-log-slow-queries-threshold-<datasource>, --db-tls-mode-<datasource>, --db-tls-trust-store-file-<datasource>, --db-tls-trust-store-type-<datasource>, --db-tls-trust-store-password-<datasource>, --db-mtls-key-store-file-<datasource>, --db-mtls-key-store-type-<datasource>, --db-mtls-key-store-password-<datasource>, --db-enabled-<datasource>, --db-driver-<datasource>, --db-kind-<datasource>"));
"--db-password, --db-password-<datasource>, --db-mtls-key-store-password, --db-tls-trust-store-password, --db-mtls-key-store-password-<datasource>, --db-tls-trust-store-password-<datasource>, --db"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! 🚀

@shawkins shawkins changed the title fix: reuse the picocli distance measure for unknown options fix: use distance measure for unknown options Apr 21, 2026
@shawkins
Copy link
Copy Markdown
Contributor Author

Upstream issue: remkop/picocli#2510

@shawkins
Copy link
Copy Markdown
Contributor Author

Updated based upon the reviews.

@shawkins shawkins force-pushed the iss48115 branch 2 times, most recently from 6c3bab4 to d5e85de Compare April 21, 2026 16:49
Pepo48
Pepo48 previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Contributor

@Pepo48 Pepo48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it once again with my initial test:

Input: --db-usr
Candidates: 24

Top 7:
  1. --db-url
  2. --db-username
  3. --db
  4. --db-url-port
  5. --db-url-host
  6. --db-driver
  7. --db-url-database

(--db, --db-url-host, --db-driver were previously dropped)

Thus, LGTM now. Thank you very much, Steve!

@shawkins shawkins force-pushed the iss48115 branch 2 times, most recently from ad8a5ef to a40776e Compare April 22, 2026 10:02
@shawkins
Copy link
Copy Markdown
Contributor Author

One more refinement - since the command suggestions are prefixed, they should retain the default limit of 3.

Signed-off-by: Steve Hawkins <[email protected]>
Copy link
Copy Markdown
Contributor

@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@vmuzikar vmuzikar merged commit 0efba8d into keycloak:main Apr 22, 2026
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refine KcUnmatchedArgumentException option filtering

4 participants