[英]How to deal with ports which are not being Grouped by neither being aggregated in Informatica Powercenter
We are working on converting Informatica mappings to Google Bigquery SQL.我们正在努力将 Informatica 映射转换为 Google Bigquery SQL。 In one of the mappings, there are a couple ports/columns, say A and B which are not getting grouped by in the Aggregator transformation and neither have been applied any aggregation function like sum, avg etc. According to senior devs in my org, in Informatica, we will get last values of these ports/columns as a result after the aggregator.
在其中一个映射中,有几个端口/列,例如 A 和 B,它们没有在聚合器转换中分组,也没有应用任何聚合 function,如 sum、avg 等。根据我组织的高级开发人员,在 Informatica 中,我们将在聚合器之后获得这些端口/列的最后一个值。 My question is, how do we convert this behaviour in BigQuery SQL?
我的问题是,我们如何在 BigQuery SQL 中转换这种行为? Because we cannot use that columns in select statement, which are not present in the Group by clause and we don't want to group by these columns.
因为我们不能在 select 语句中使用这些列,这些列不存在于 Group by 子句中,我们不想按这些列进行分组。 For getting last value of the column, we have LAST_VALUE() analytic function in bigquery, but even then we cannot use the group by and analytic function in same select statement.
为了获取列的最后一个值,我们在 bigquery 中有 LAST_VALUE() 分析 function,但即便如此,我们也不能在同一个 Z99938282F04071852ZEFE 语句中使用 group by 和分析 function。 I would really appreciate some help!
我真的很感激一些帮助!
to convert Infa mapping with aggregator to big SQL, I would use row_number over (partitioned by id order by id) as rn
and then in outside put a filter rn=1.要将带有聚合器的 Infa 映射转换为大 SQL,我将使用
row_number over (partitioned by id order by id) as rn
,然后在外面放置一个过滤器 rn=1。
Informatica aggregator - id is group by column. Informatica 聚合器 - id 按列分组。
Equivalent SQL should look like this -等效的 SQL 应该如下所示 -
select a,b,id
from
(select a,b,row_number over (partitioned by id order by id desc) as rn --this will mimic informatica aggregator. id column is the group by port. if you have any sorter before aggregator add all ports as per order in order by column on same sequence but reverse order(asc/desc)
from mytable) rs
where rs.rn=1 -- this will ensure to pick latest row.
Use some aggregation function.使用一些聚合 function。
In Informatica you will get LAST value.在 Informatica 中,您将获得 LAST 值。 This is not deterministic.
这不是确定性的。 It basically means that either
这基本上意味着
First two cases mean you can use MIN / MAX / whatsoever.前两种情况意味着您可以使用 MIN / MAX / any。 The result will be same or you don't care.
结果将是相同的,否则您不在乎。
If the last one is your case, ARRAY_AGG should help you, as per this answer .如果最后一个是你的情况, ARRAY_AGG应该可以帮助你,根据这个答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.