简体   繁体   English

Apache Flink:如何计算DataStream中的事件总数

[英]Apache Flink: How to count the total number of events in a DataStream

I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. 我有两个原始流,我正在加入这些流,然后我要计算已加入的事件总数是多少,尚未加入的事件有多少。 I am doing this by using map on joinedEventDataStream as shown below 我通过在joinedEventDataStream上使用地图来执行此joinedEventDataStream ,如下所示

joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {

            @Override
            public Object map(JoinedEvent joinedEvent) throws Exception {

                number_of_joined_events += 1;

                return null;
            }
        });

Question # 1: Is this the appropriate way to count the number of events in the stream? 问题#1:这是计算流中事件数量的适当方法吗?

Question # 2: I have noticed a wired behavior, which some of you might not believe. 问题2:我注意到一种有线行为,有些人可能不相信。 The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events but 0 in the case when I submit this program as jar . 问题是,当我在IntelliJ IDE中运行Flink程序时,它向我显示number_of_joined_events正确值,但当我将该程序提交为jar时为0 So I am getting the initial value of number_of_joined_events when I run the program as a jar file instead of the actual count. 因此,当我将程序作为jar文件而不是实际计数运行时,我得到number_of_joined_events的初始值。 Why is this happening only in case of jar file submission and not in IDE? 为什么仅在提交jar文件而不在IDE中发生这种情况?

Your approach is not working. 您的方法无效。 The behavior you noticed when executing the program via a JAR file is expected. 通过JAR文件执行程序时,您会注意到行为。

I don't know how number_of_joined_events is defined, but I assume its a static variable in your program. 我不知道如何定义number_of_joined_events ,但是我假设它在您的程序中是一个静态变量。 When you run the program in your IDE, it runs in a single JVM. 当您在IDE中运行该程序时,该程序将在单个JVM中运行。 Hence, all operators have access to the static variable. 因此,所有运算符都可以访问静态变量。 When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated. 当您将JAR文件提交到远程进程时,该程序将在其他JVM(可能是多个JVM)中执行,并且客户端进程中的静态变量永远不会更新。

You can use Flink's metrics or a ReduceFunction that sums 1 s to count the number of processed records. 您可以使用Flink的指标或合计1 s的ReduceFunction来计算已处理记录的数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM