简体   繁体   English

Apache Beam与数据流运行器中的聚合器

[英]Aggregators in Apache beam with dataflow runner

I am trying to create aggregators to count values that satisfy a condition across all input data . 我正在尝试创建聚合器以对所有输入数据中满足条件的值进行计数。 I looked into documentation and found the below for creation . 我查看了文档,发现以下内容可供创建。

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator .. https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator

I am using : google-cloud-dataflow-java-sdk-all - 2.4.0 (apache beam based) 我正在使用:google-cloud-dataflow-java-sdk-all-2.4.0(基于Apache Beam)

However I am not able to find the corresponding class in the new beam api.. I looked into org.apache.beam.sdk.transforms package . 但是,我无法在新的Beam api中找到相应的类。我查看了org.apache.beam.sdk.transforms包。

Can you please let me know how can I use aggregators with dataflow runner in new api . 您能否让我知道如何在新api中将聚合器与dataflowRunner一起使用? ?

The link you have is for the old SDK (1.x). 您拥有的链接适用于旧的SDK(1.x)。

In SDK 2.x, you should refer to apache-beam SDK. 在SDK 2.x中,您应该参考apache-beam SDK。 For the Aggregators you mentioned, if I understand correctly, it's for adding counters during processing. 对于您提到的Aggregators ,如果我理解正确,它用于在处理期间添加计数器。 I guess the corresponding package should be org.apache.beam.sdk.metrics . 我猜对应的应该是org.apache.beam.sdk.metrics

Package org.apache.beam.sdk.metrics Metrics allow exporting information about the execution of a pipeline. 包org.apache.beam.sdk.metrics度量标准允许导出有关管道执行的信息。

and org.apache.beam.sdk.metrics.Counter interface: org.apache.beam.sdk.metrics.Counter接口:

A metric that reports a single long value and can be incremented or decremented. 报告单个long值并且可以递增或递减的度量标准。

As of now, there seem to be no replacement for the Aggregator class in Apache Beam SDK 2.X. 到目前为止,Apache Beam SDK 2.X中的Aggregator类似乎没有替代品。 An alternate solution to count values respecting a condition would be Transforms . 计算符合条件的值的另一种方法是Transforms By using the GroupBy transform to collect data meeting a condition and then the Combine transform, you can get a count of the input data respecting the condition. 通过使用GroupBy变换收集满足条件的数据,然后使用Combine变换,您可以获得与条件相关的输入数据计数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM