简体繁体 English

我们知道行数据的数据缓存和组值的索引缓存。聚合器在操作之前会处理所有要缓存的数据吗？

[英]we know data cache for row data and index cache for group values. does aggregator process all the data to cache before its operation?

原文 2020-04-19 14:03:22 7 1 data-warehouse/ informatica/ informatica-powercenter/ informatica-powerexchange

Can you please help me to understand this by taking below example.您能否通过以下示例帮助我理解这一点。

Group by cust_id,item_id.按 cust_id、item_id 分组。

what records will process to caches(index/data) in both scenarios with sorted input and unsorted input?在排序输入和未排序输入的两种情况下，哪些记录将处理到缓存（索引/数据）？ What will be case if cache memory runs out?Which alogritham it uses to perform aggregate calculations internally?如果缓存 memory 用完会怎样？它使用哪个算法在内部执行聚合计算？

1 个解决方案

I don't know about internal algorithm, but in unsorted mode, it's normal for the Aggregator to store all rows in cache and wait for the last row, because it could be the first that must be returned according to Aggregator rules !我不知道内部算法，但是在未排序模式下，聚合器将所有行存储在缓存中并等待最后一行是正常的，因为它可能是根据聚合器规则必须返回的第一行！ The Aggregator will never complain about the order of incoming rows.聚合器永远不会抱怨传入行的顺序。 When using cache, it will store rows first in memory, then when the allocated memory is full, it will push cache to disk.使用缓存时，它会先将行存储在 memory 中，然后当分配的 memory 已满时，它会将缓存推送到磁盘。 If it runs out of disk space, the session will fail (and maybe others because of that full disk).如果磁盘空间不足，则 session 将失败（可能其他磁盘空间已满）。 You will have to clean those files manually.您将不得不手动清理这些文件。

In sorted mode, there is no such problem: rows come in groups ready to be aggregated and the aggregated row will go out as soon as all rows from a group are received, which is detected when one of the values of the keys changes.在排序模式下，不存在这样的问题：行进入准备聚合的组，并且一旦收到组中的所有行，聚合行就会 go 出来，当键的值之一发生变化时会检测到这一点。 The Aggregator will complain and stop if rows are not in expected order.如果行不按预期顺序，聚合器将抱怨并停止。 However it pushes the problem upward to the sorting part, that could be a Sorter , which can use a lot of cache itself, or the database with an ORDER BY clause in the SQL query that could take resources on the database side.然而，它将问题向上推到排序部分，这可能是一个Sorter ，它本身可以使用大量缓存，或者是 SQL 查询中带有ORDER BY子句的数据库，这可能会占用数据库端的资源。

Be careful also that SQL ORDER BY may use a different locale than Informatica.还要注意 SQL ORDER BY可能使用与 Informatica 不同的语言环境。