fix: use distance measure for unknown options#48217
Conversation
8737c31 to
c62d865
Compare
|
Another option is to always perform the sorting. |
+1 for using our own method. I think relying on a private inner method obtained via reflection is a ticking bomb. I looked at their code just out of curiosity (I happened to play with Levenshtein in the past, when I needed to parse out a specific sport event from a huge IPTV list 😄 - many team name variants) - they seem to use Bigram cosine similarity, which behaves differently as I searched and read through materials. I did some primitive testing with KC options and I think we can stick with Bigram cosine similarity. I tried 8 scenarios - BCS won in 5, for the rest it was a tie. Levenshtein favours short strings, it ranked On top of that, our own impl can give us some advantage - more modern syntax (Picocli uses Java 5 style still) and more importantly better results. Picocli's |
Pepo48
left a comment
There was a problem hiding this comment.
Just for the reference what I was playing with - hugely inspired by Picocli, the math is identical:
I wouldn't necessarily say bomb :) - if there were a problem, then we'd just return the full list as we do currently. We'd also catch this in testing. However it could slow down an upgrade eventually.
Thank you for looking at their code (which is currently used for command suggestions). That sounds like multiple upstream issues are needed then - one for using similarity on options, and another on not dropping things with the same score. |
closes: keycloak#48115 Signed-off-by: Steve Hawkins <[email protected]>
|
@Pepo48 I adapted your proposal to include our own cosine similarity to be more general - mostly replacing any usage of PicoCli's function. I'll still create upstream issues as well. |
| .errorText("Unknown option: '--db-pasword'").toString())); | ||
| assertThat(nonRunningPicocli.getErrString(), containsString( | ||
| "Possible solutions: --db-url, --db-connect-timeout, --db-url-host, --db-url-database, --db-url-port, --db-url-properties, --db-username, --db-password, --db-schema, --db-pool-initial-size, --db-pool-min-size, --db-pool-max-size, --db-pool-max-lifetime, --db-debug-jpql, --db-log-slow-queries-threshold, --db-tls-mode, --db-tls-trust-store-file, --db-tls-trust-store-type, --db-tls-trust-store-password, --db-mtls-key-store-file, --db-mtls-key-store-type, --db-mtls-key-store-password, --db-driver, --db, --db-url-full-<datasource>, --db-connect-timeout-<datasource>, --db-url-host-<datasource>, --db-url-database-<datasource>, --db-url-port-<datasource>, --db-url-properties-<datasource>, --db-username-<datasource>, --db-password-<datasource>, --db-schema-<datasource>, --db-pool-initial-size-<datasource>, --db-pool-min-size-<datasource>, --db-pool-max-size-<datasource>, --db-debug-jpql-<datasource>, --db-log-slow-queries-threshold-<datasource>, --db-tls-mode-<datasource>, --db-tls-trust-store-file-<datasource>, --db-tls-trust-store-type-<datasource>, --db-tls-trust-store-password-<datasource>, --db-mtls-key-store-file-<datasource>, --db-mtls-key-store-type-<datasource>, --db-mtls-key-store-password-<datasource>, --db-enabled-<datasource>, --db-driver-<datasource>, --db-kind-<datasource>")); | ||
| "--db-password, --db-password-<datasource>, --db-mtls-key-store-password, --db-tls-trust-store-password, --db-mtls-key-store-password-<datasource>, --db-tls-trust-store-password-<datasource>, --db")); |
|
Upstream issue: remkop/picocli#2510 |
|
Updated based upon the reviews. |
6c3bab4 to
d5e85de
Compare
Pepo48
left a comment
There was a problem hiding this comment.
I tested it once again with my initial test:
Input: --db-usr
Candidates: 24
Top 7:
1. --db-url
2. --db-username
3. --db
4. --db-url-port
5. --db-url-host
6. --db-driver
7. --db-url-database
(--db, --db-url-host, --db-driver were previously dropped)
Thus, LGTM now. Thank you very much, Steve!
ad8a5ef to
a40776e
Compare
|
One more refinement - since the command suggestions are prefixed, they should retain the default limit of 3. |
Signed-off-by: Steve Hawkins <[email protected]>
After thinking more about the part matching it seems better to use a general notion of similarity. While it doesn't completely understand the wildcard convention it seems to be close enough based upon the prefix.
The change here will return the closest 7 options, rather than all options that start with same prefix.
This reuses the picocli cosine distance as I don't see another text distance utility in our code base.
However this is via reflection. The next best options would be to copy that or some code that implements levenshtein - rather than introducing a dependency on commons text.
closes: #48115