Scope schema discovery to target host's cluster in multi-cluster CHI#1965
Open
lukas-pfannschmidt-tr wants to merge 1 commit intoAltinity:0.27.0from
Open
Conversation
HostCreateTables used api.ClickHouseInstallation{} as the scope for
Names(NameFQDNs, ...) when building the endpoint list passed to
QueryUnzip2Columns/QueryAny. In a CHI that defines multiple clusters,
this walked hosts from every cluster and allowed QueryAny to pick a
source node outside the target cluster. The SQL filters by cluster name
via clusterAllReplicas, but sqlCreateTableReplicated joins against the
executing node's local system.databases, so the returned set of CREATE
statements depends on which cluster's node answered first — leading to
missing or wrong schemas on newly added replicas.
Use api.Cluster{} scope in getReplicatedObjectsSQLs and
getDistributedObjectsSQLs so schema-discovery endpoints are restricted
to the target host's own cluster, matching the scoping already used by
shouldCreateReplicatedObjects / shouldCreateDistributedObjects.
Fixes Altinity#1964
Signed-off-by: Lukas Pfannschmidt <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1964.
HostCreateTablesusedapi.ClickHouseInstallation{}as the scope forNames(NameFQDNs, ...)when building the endpoint list passed toQueryUnzip2Columns/QueryAny. In a CHI that defines multipleclusters, this walked hosts from every cluster and allowedQueryAnyto pick a source node outside the target cluster.The SQL does filter by cluster name via
clusterAllReplicas('<target>', system.tables), butsqlCreateTableReplicatedjoins against the executing node's localsystem.databases:So the returned set of CREATE statements depends on which cluster's node answered first — resulting in missing or incorrect schemas on newly added replicas in multi-cluster CHIs.
Timeline (for context)
CreatePodFQDNsOfCHI(host.GetCHI())pre-2021 and was preserved through the 2021 unification (6b946799d) and the 2024 schemer refactor (b64a6241d).LOCAL JOIN system.databasesform landed ind49187d0b(first inrelease-0.23.6), which made the executing-node dependency stricter.6d625de69added a trailing dot topatternNamespaceDomain(%s.svc.cluster.local.). That fixed slow/failing DNS resolution underndots:5, but it also removed an accidental failure-mode that had been masking the scoping bug: before 0.26, cross-cluster endpoints in the CHI-wide list could fail DNS quickly andQueryAnywould fall through to a same-cluster endpoint. After 0.26, every CHI endpoint resolves reliably, soQueryAnyreturns from whichever is first in the slice — which, with CHI-wide scoping, can be a node from a different cluster.The 0.26 DNS change didn't cause this bug — it exposed it. The root cause is the schema-discovery endpoint scope.
Change
Use
api.Cluster{}scope ingetReplicatedObjectsSQLsandgetDistributedObjectsSQLs, so schema-discovery endpoints are restricted to the target host's own cluster. This matches the scoping already used byshouldCreateReplicatedObjects/shouldCreateDistributedObjectsfor the related gating logic.Six call sites updated:
pkg/model/chi/schemer/replicated.go(databases, tables, functions)pkg/model/chi/schemer/distributed.go(databases, tables, functions)Test plan
go build ./...Single-cluster CHIs are unaffected: cluster-scoped FQDNs equal CHI-scoped FQDNs in that case.