[英]How to improve cubejs preagregations creation process? (its taking too long to build preaggs even with partitionGranularity)
We are having trouble with the preaggregations creation performance.我们在预聚合创建性能方面遇到了问题。 We currently have specific filters for the data for each one of our clients, and we generate different cubes for each one of them by extending a base cube (called
Metrics
) and defining a segment that represents those filters.目前,我们为每个客户的数据设置了特定的过滤器,我们通过扩展基本多维数据集(称为
Metrics
)并定义代表这些过滤器的段来为每个客户生成不同的多维数据集。
To summarize, we have a Metrics
base cube, and we generate dynamic cubes MetricsA, MetricsB, MetricsC
for clients A, B, C
.总而言之,我们有一个
Metrics
基础多维数据集,我们为客户端A, B, C
生成动态多维数据集MetricsA, MetricsB, MetricsC
。 Each one of these cubes has a segment that we call z
, which contains a specific SQL query for each of our clients.这些立方体中的每一个都有一个我们称为
z
的段,其中包含针对我们每个客户端的特定 SQL 查询。 The data to build the segment is retrieved from our API using asyncModule
, and then we extend the Metrics
cube to generate all the clients specific cubes by overriding the z
segment with the client's filter
.使用
asyncModule
从我们的 API 中检索构建段的数据,然后我们扩展Metrics
多维数据集以通过使用客户端的filter
覆盖z
段来生成所有客户端特定的多维数据集。 By doing this, when a client queries the cube service, the data retrieved will come from their specific cube, with the data already filtered (by the enforced z
segment).通过这样做,当客户端查询多维数据集服务时,检索到的数据将来自其特定的多维数据集,数据已经过滤(通过强制
z
段)。
This Metrics cube is built by joining large tables, so we also added a partitionGranularity
(monthly) to reduce the size of the preaggregations, but they are still taking too long to build (> 10 minutes).这个 Metrics 多维数据集是通过加入大表来构建的,因此我们还添加了一个
partitionGranularity
(每月)以减少预聚合的大小,但它们的构建时间仍然太长(> 10 分钟)。
We need to edit the specific query that the cube service submits to create the preaggregation tables, so we only keep the rows with the z
segment = 1 (because that is the relevant data), or at least we want to be able to rearrange/modify the query to improve performance.我们需要编辑多维数据集服务提交的特定查询以创建预聚合表,因此我们只保留
z
段 = 1 的行(因为这是相关数据),或者至少我们希望能够重新排列/修改查询以提高性能。 Which is the best place to do such changes?哪个是进行此类更改的最佳位置? or what is the recommended practice to intervene this process?
或者干预这个过程的推荐做法是什么?
There're two approaches you can use to leverage pre-aggregations in multi-tenant environments.您可以使用两种方法在多租户环境中利用预聚合。
sql
for each customer cube such as OrdersC1
, OrdersC2
, etc. In this case all pre-aggregations defined in base Orders
cube will be inherited.OrdersC1
、 OrdersC2
等)覆盖sql
。在这种情况下,将继承基本Orders
多维数据集中定义的所有预聚合。 Each customer cube will have it's own set of pre-aggregations.N
customers and M
pre-aggregations then N * M
pre-aggregation tables should be built which can be costly in some scenarios.N
客户和M
预聚合,那么应该构建N * M
预聚合表,这在某些情况下可能会很昂贵。cube(`Orders`, {
sql: `SELECT * FROM orders`,
preAggregations: {
date: {
type: `rollup`,
measureReferences: [someMeasure],
dimensionReferences: [someDimension],
timeDimensionReference: date,
granularity: `month`
},
// ...
}
});
cube(`OrdersC1`, {
extends: Orders,
sql: `SELECT * FROM orders WHERE customer_id = 'C1'`,
});
cube(`Orders`, {
sql: `SELECT * FROM orders`,
// ...
dimensions: {
// ...
customerId: {
sql: `customer_id`,
type: `string`
}
},
preAggregations: {
date: {
type: `rollup`,
measureReferences: [someMeasure],
dimensionReferences: [customerId, someDimension],
timeDimensionReference: date,
granularity: `month`
},
// ...
}
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.