简体   繁体   中英

Apache Flink: How to count the total number of events in a DataStream

I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below

joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {

            @Override
            public Object map(JoinedEvent joinedEvent) throws Exception {

                number_of_joined_events += 1;

                return null;
            }
        });

Question # 1: Is this the appropriate way to count the number of events in the stream?

Question # 2: I have noticed a wired behavior, which some of you might not believe. The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events but 0 in the case when I submit this program as jar . So I am getting the initial value of number_of_joined_events when I run the program as a jar file instead of the actual count. Why is this happening only in case of jar file submission and not in IDE?

Your approach is not working. The behavior you noticed when executing the program via a JAR file is expected.

I don't know how number_of_joined_events is defined, but I assume its a static variable in your program. When you run the program in your IDE, it runs in a single JVM. Hence, all operators have access to the static variable. When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated.

You can use Flink's metrics or a ReduceFunction that sums 1 s to count the number of processed records.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM