简体   繁体   English

Flink流式传输,“ sum”到底是做什么的?

[英]Flink streaming, what exactly does 'sum' do?

I have trouble understanding streaming, take workcount as an example, for infinite source like Kafka, what exactly does 'sum' do? 我在理解流式传输时遇到了麻烦,以工作计数为例,对于像Kafka这样的无限源,“ sum”到底是做什么的?

DataStream<Tuple2<String, Long>> counts = input
                ......
                .returns(Types.TUPLE(Types.STRING, Types.LONG))
                .keyBy(0)
                .sum(1);

I kinda understand it when there's a time window, it's like a 'batch' to me since it has start and end time, but when there's no time window at all, 当有时间窗口时,我有点理解,因为它有开始和结束时间,这对我来说就像一个“批处理”,但是当根本没有时间窗口时,

  1. What is the start time and end time? 开始时间和结束时间是多少?
  2. When the 3rd time Flink receives word 'foo', does 'sum' go through all the old 'foo', do 1+1+1, and give the result '3'. 当第三次Flink收到单词'foo'时,'sum'会遍历所有旧的'foo',执行1 + 1 + 1,并给出结果'3'。 Or, Flink somehow saves a intermediate result '2' on the previous step, so 'sum' only do 2+1? 还是Flink以某种方式在上一步中保存了中间结果'2',所以'sum'只做2 + 1吗?
  3. Is there an alternative way to do the sum, I mean, use keyBy(0).process(...) or something? 我是否有另一种方法来求和,使用keyBy(0).process(...)或其他方法?

The specified program will translate to a StreamGroupedReduce with a SumAggregator . 指定的程序将使用StreamGroupedReduce转换为SumAggregator What the StreamGroupedReduce will do is to continuously reduce the incoming data stream and outputting the new reduced value after every incoming record. StreamGroupedReduce要做的是连续减少传入的数据流,并在每个传入的记录之后输出新的减小的值。

Internally, the StreamGroupedReduce uses a ValueState which keeps the current reduce value. 在内部, StreamGroupedReduce使用ValueState保留当前的缩减值。 Whenever a new record arrives, the current reduce value is combined with the incoming record by calling the ReduceFunction (in your case SumAggregator ). 每当有新记录到达时,通过调用ReduceFunction (在您的情况下为SumAggregator ),将当前的缩减值与传入的记录合并。 The result of this operation is then stored in the operator's ValueState and output to down stream consumers. 然后,此操作的结果存储在运算符的ValueState并输出到下游使用者。

For example: The input stream 1, 2, 3, 4, 5 will generate the following output when being summed: 1, 3, 5, 9, 14 . 例如:输入流1, 2, 3, 4, 5求和时将生成以下输出: 1, 3, 5, 9, 14

If you want, then you can implement the same behaviour with keyBy(0).process(...) . 如果需要,可以使用keyBy(0).process(...)实现相同的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM