繁体   English   中英

Clickhouse 实体化视图聚合 ghost 行

[英]Clickhouse materialized view aggregate ghost rows

所以我正在使用 clickhouse,这是我当前的表架构。

我有一个包含我的数据的主表:

CREATE TABLE default.Liquidity
(
    `Date` Date,
    `LiquidityId` UInt64,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `IsIn` String,
    `Currency` String,
    `Scenario` String,
    `Price` String,
    `Leg` Int8,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` UInt64,
    `stream_id` Int64
)
ENGINE = Distributed('{cluster}', '', 'shard_Liquidity', TreeId_LQ)

而且我还有一个物化视图,将聚合数据存储在其他表中

CREATE MATERIALIZED VIEW default.mv_Liquidity_facet TO default.shard_state_Liquidity_facet
(
    `Date` Date,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `Currency` String,
    `Scenario` String,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` AggregateFunction(sum, UInt64)
) AS
SELECT
    Date,
    TreeId_LQ,
    AggregateId,
    ClientId,
    InstrumentId,
    Currency,
    Scenario,
    commit,
    factor,
    sumState(nb_aggregated) AS nb_aggregated
FROM default.shard_Liquidity
GROUP BY
    Date,
    TreeId_LQ,
    AggregateId,
    ClientId,
    InstrumentId,
    Currency,
    Scenario,
    commit,
    factor


----------------

CREATE TABLE default.shard_state_Liquidity_facet
(
    `Date` Date,
    `TreeId_LQ` UInt64,
    `AggregateId` UInt64,
    `ClientId` UInt64,
    `InstrumentId` UInt64,
    `Currency` String,
    `Scenario` String,
    `commit` Int64,
    `factor` Int8,
    `nb_aggregated` AggregateFunction(sum, UInt64)
)
ENGINE = ReplicatedAggregatingMergeTree('{zoo_prefix}/tables/{shard}/shard_state_Liquidity_facet', '{host}')
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
SETTINGS index_granularity = 8192

正如您可能已经猜到的那样, nb_aggregated列表示为实现此结果而聚合的行数。

如果我使用大量过滤器对我的分布式查询进行查询以查找一行

select
       sum(nb_aggregated)               AS nb_aggregated
from Liquidity
where Date = '2022-10-17'
  and TreeId_LQ = 1129
  and AggregateId = 999999999999
  and ClientId = 1
  and InstrumentId = 593
  and Currency = 'AUD'
  and Scenario = 'BAU'
  and commit = -2695401333399944382
  and factor = 1;

--- Result
1

我最终只有一行,因此,如果我使用相同的过滤器进行相同的查询,但其中一个是使用物化视图创建的表的聚合版本,我也应该只得到一行并且nb_aggregated = 1但是我最终得到nb_aggregated = 2就好像他已经将我的行与另一个聚合在一起并且大多数其他值也是错误的。

我知道我的例子很难理解,但如果你有任何领先优势,那就太好了。

好吧,我在 github 上的 clickhouse 存储库中问了同样的问题,Denny Crane 给了我这个对我有用的答案: https://github.com/ClickHouse/ClickHouse/issues/43988#issuecomment-1339731917

在大多数情况下,MatView group by 应该匹配存储表ORDER BY

CREATE MATERIALIZED VIEW default.mv_Liquidity_facet:
GROUP BY Date, TreeId_LQ, AggregateId, ClientId, InstrumentId, Currency, Scenario, commit, factor

CREATE TABLE default.shard_state_Liquidity_facet
PARTITION BY Date
ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario)
Your ReplicatedAggregatingMergeTree "CORRUPTS" Currency / factor columns using ANY function

解决方案是

ORDER BY (commit, TreeId_LQ, ClientId, AggregateId, InstrumentId, Scenario, Currency  , factor)

https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM