[英]Clickhouse materialized view aggregate ghost rows
所以我正在使用 clickhouse,這是我當前的表架構。
我有一個包含我的數據的主表:
CREATE TABLE default.Liquidity
(
`Date` Date,
`LiquidityId` UInt64,
`TreeId_LQ` UInt64,
`AggregateId` UInt64,
`ClientId` UInt64,
`InstrumentId` UInt64,
`IsIn` String,
`Currency` String,
`Scenario` String,
`Price` String,
`Leg` Int8,
`commit` Int64,
`factor` Int8,
`nb_aggregated` UInt64,
`stream_id` Int64
)
ENGINE = Distributed('{cluster}', '', 'shard_Liquidity', TreeId_LQ)
而且我還有一個物化視圖,將聚合數據存儲在其他表中
CREATE MATERIALIZED VIEW default.mv_Liquidity_facet TO default.shard_state_Liquidity_facet
(
`Date` Date,
`TreeId_LQ` UInt64,
`AggregateId` UInt64,
`ClientId` UInt64,
`InstrumentId` UInt64,
`Currency` String,
`Scenario` String,
`commit` Int64,
`factor` Int8,
`nb_aggregated` AggregateFunction(sum, UInt64)
) AS
SELECT
Date,
TreeId_LQ,
AggregateId,
ClientId,
InstrumentId,
Currency,
Scenario,
commit,
factor,
sumState(nb_aggregated) AS nb_aggregated
FROM default.shard_Liquidity
GROUP BY
Date,
TreeId_LQ,
AggregateId,
ClientId,
InstrumentId,
Currency,
Scenario,
commit,
factor
----------------
CREATE TABLE default.shard_state_Liquidity_facet
(
`Date` Date,
`TreeId_LQ` UInt64,
`AggregateId` UInt64,
`ClientId` UInt64,
`InstrumentId` UInt64,
`Currency` String,
`Scenario` String,
`commit` Int64,
`factor` Int8,
`nb_aggregated` AggregateFunction(sum, UInt64)
)
ENGINE = ReplicatedAggregatingMergeTree('{zoo_prefix}/tables/{shard}/shard_state_Liquidity_facet', '{host}')
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
SETTINGS index_granularity = 8192
正如您可能已經猜到的那樣, nb_aggregated
列表示為實現此結果而聚合的行數。
如果我使用大量過濾器對我的分布式查詢進行查詢以查找一行
select
sum(nb_aggregated) AS nb_aggregated
from Liquidity
where Date = '2022-10-17'
and TreeId_LQ = 1129
and AggregateId = 999999999999
and ClientId = 1
and InstrumentId = 593
and Currency = 'AUD'
and Scenario = 'BAU'
and commit = -2695401333399944382
and factor = 1;
--- Result
1
我最終只有一行,因此,如果我使用相同的過濾器進行相同的查詢,但其中一個是使用物化視圖創建的表的聚合版本,我也應該只得到一行並且nb_aggregated = 1
但是我最終得到nb_aggregated = 2
就好像他已經將我的行與另一個聚合在一起並且大多數其他值也是錯誤的。
我知道我的例子很難理解,但如果你有任何領先優勢,那就太好了。
好吧,我在 github 上的 clickhouse 存儲庫中問了同樣的問題,Denny Crane 給了我這個對我有用的答案: https://github.com/ClickHouse/ClickHouse/issues/43988#issuecomment-1339731917
在大多數情況下,MatView group by 應該匹配存儲表ORDER BY
CREATE MATERIALIZED VIEW default.mv_Liquidity_facet:
GROUP BY Date, TreeId_LQ, AggregateId, ClientId, InstrumentId, Currency, Scenario, commit, factor
CREATE TABLE default.shard_state_Liquidity_facet
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
Your ReplicatedAggregatingMergeTree "CORRUPTS" Currency / factor columns using ANY function
解決方案是
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario, Currency , factor)
https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.