简体   繁体   English

Apache 光束全局计数

[英]Apache Beam Global Counting

I am trying to understand the best way of solving the following:我试图了解解决以下问题的最佳方法:

As simple example scenario, I have a file which describes a test name and if its execution passed (true/false).作为简单的示例场景,我有一个文件,它描述了一个测试名称以及它的执行是否通过(真/假)。

test-scenario,passed
--------------------
testA,true
testB,false

Using apache beam I can read, parse the file into PCollection<TestDetails> and then using subsequent transforms write all test details which have passed to one set of files and likewise for those tests which failed.使用 apache 光束,我可以读取,将文件解析为PCollection<TestDetails>然后使用后续转换将所有已传递到一组文件的测试详细信息写入一组文件,同样适用于那些失败的测试。

After writing the above files I would finally like to generate some counts regarding: the total number of file records processed, number tests that passed, number test that failed and write these details to a single file.在编写完上述文件后,我最终想生成一些关于:处理的文件记录总数、通过的测试数、失败的测试数并将这些详细信息写入单个文件的计数。

Should I use a combine global for this?我应该为此使用全局组合吗?

For this purpose, you can use Beam Metrics (please, see the documentation ).为此,您可以使用 Beam Metrics(请参阅文档)。 It provides counters, that can be used for the needs you described above, and then metrics can be fetched once your pipeline is finished.它提供了计数器,可用于满足上述需求,然后在管道完成后即可获取指标。 Please, take a look on this example .请看一下这个例子 Also, Beam allows to export metrics into external sink, if it's more convenient.此外,如果更方便,Beam 允许将指标导出到外部接收器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM