简体   繁体   English

如何改进cubejs预聚合创建过程? (即使使用 partitionGranularity 构建 preaggs 也需要很长时间)

[英]How to improve cubejs preagregations creation process? (its taking too long to build preaggs even with partitionGranularity)

We are having trouble with the preaggregations creation performance.我们在预聚合创建性能方面遇到了问题。 We currently have specific filters for the data for each one of our clients, and we generate different cubes for each one of them by extending a base cube (called Metrics ) and defining a segment that represents those filters.目前,我们为每个客户的数据设置了特定的过滤器,我们通过扩展基本多维数据集(称为Metrics )并定义代表这些过滤器的段来为每个客户生成不同的多维数据集。

To summarize, we have a Metrics base cube, and we generate dynamic cubes MetricsA, MetricsB, MetricsC for clients A, B, C .总而言之,我们有一个Metrics基础多维数据集,我们为客户端A, B, C生成动态多维数据集MetricsA, MetricsB, MetricsC Each one of these cubes has a segment that we call z , which contains a specific SQL query for each of our clients.这些立方体中的每一个都有一个我们称为z的段,其中包含针对我们每个客户端的特定 SQL 查询。 The data to build the segment is retrieved from our API using asyncModule , and then we extend the Metrics cube to generate all the clients specific cubes by overriding the z segment with the client's filter .使用asyncModule从我们的 API 中检索构建段的数据,然后我们扩展Metrics多维数据集以通过使用客户端的filter覆盖z段来生成所有客户端特定的多维数据集。 By doing this, when a client queries the cube service, the data retrieved will come from their specific cube, with the data already filtered (by the enforced z segment).通过这样做,当客户端查询多维数据集服务时,检索到的数据将来自其特定的多维数据集,数据已经过滤(通过强制z段)。

This Metrics cube is built by joining large tables, so we also added a partitionGranularity (monthly) to reduce the size of the preaggregations, but they are still taking too long to build (> 10 minutes).这个 Metrics 多维数据集是通过加入大表来构建的,因此我们还添加了一个partitionGranularity (每月)以减少预聚合的大小,但它们的构建时间仍然太长(> 10 分钟)。
We need to edit the specific query that the cube service submits to create the preaggregation tables, so we only keep the rows with the z segment = 1 (because that is the relevant data), or at least we want to be able to rearrange/modify the query to improve performance.我们需要编辑多维数据集服务提交的特定查询以创建预聚合表,因此我们只保留z段 = 1 的行(因为这是相关数据),或者至少我们希望能够重新排列/修改查询以提高性能。 Which is the best place to do such changes?哪个是进行此类更改的最佳位置? or what is the recommended practice to intervene this process?或者干预这个过程的推荐做法是什么?

There're two approaches you can use to leverage pre-aggregations in multi-tenant environments.您可以使用两种方法在多租户环境中利用预聚合。

  1. Override sql for each customer cube such as OrdersC1 , OrdersC2 , etc. In this case all pre-aggregations defined in base Orders cube will be inherited.为每个客户多维数据集(例如OrdersC1OrdersC2等)覆盖sql 。在这种情况下,将继承基本Orders多维数据集中定义的所有预聚合。 Each customer cube will have it's own set of pre-aggregations.每个客户多维数据集都有自己的一组预聚合。 It means if there're N customers and M pre-aggregations then N * M pre-aggregation tables should be built which can be costly in some scenarios.这意味着如果有N客户和M预聚合,那么应该构建N * M预聚合表,这在某些情况下可能会很昂贵。
cube(`Orders`, {
  sql: `SELECT * FROM orders`,

  preAggregations: {
    date: {
      type: `rollup`,
      measureReferences: [someMeasure],
      dimensionReferences: [someDimension],
      timeDimensionReference: date,
      granularity: `month`
    },
    // ...
  }
});

cube(`OrdersC1`, {
  extends: Orders,
  sql: `SELECT * FROM orders WHERE customer_id = 'C1'`,
});
  1. Use tenant field as a dimension of rollup.使用租户字段作为汇总维度。 Every segment can be converted to dimension which provides an opportunity to use single rollup table for all customers.每个段都可以转换为维度,这提供了为所有客户使用单个汇总表的机会。 To route requests to right tenant data queryTransformer can be used.可以使用queryTransformer将请求路由到正确的租户数据。
cube(`Orders`, {
  sql: `SELECT * FROM orders`,

  // ...

  dimensions: {
    // ...

    customerId: {
      sql: `customer_id`,
      type: `string`
    }
  },

  preAggregations: {
    date: {
      type: `rollup`,
      measureReferences: [someMeasure],
      dimensionReferences: [customerId, someDimension],
      timeDimensionReference: date,
      granularity: `month`
    },

    // ...
  }
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM