[英]Flink streaming, what exactly does 'sum' do?
I have trouble understanding streaming, take workcount as an example, for infinite source like Kafka, what exactly does 'sum' do? 我在理解流式传输时遇到了麻烦,以工作计数为例,对于像Kafka这样的无限源,“ sum”到底是做什么的?
DataStream<Tuple2<String, Long>> counts = input
......
.returns(Types.TUPLE(Types.STRING, Types.LONG))
.keyBy(0)
.sum(1);
I kinda understand it when there's a time window, it's like a 'batch' to me since it has start and end time, but when there's no time window at all, 当有时间窗口时,我有点理解,因为它有开始和结束时间,这对我来说就像一个“批处理”,但是当根本没有时间窗口时,
The specified program will translate to a StreamGroupedReduce
with a SumAggregator
. 指定的程序将使用
StreamGroupedReduce
转换为SumAggregator
。 What the StreamGroupedReduce
will do is to continuously reduce the incoming data stream and outputting the new reduced value after every incoming record. StreamGroupedReduce
要做的是连续减少传入的数据流,并在每个传入的记录之后输出新的减小的值。
Internally, the StreamGroupedReduce
uses a ValueState
which keeps the current reduce value. 在内部,
StreamGroupedReduce
使用ValueState
保留当前的缩减值。 Whenever a new record arrives, the current reduce value is combined with the incoming record by calling the ReduceFunction
(in your case SumAggregator
). 每当有新记录到达时,通过调用
ReduceFunction
(在您的情况下为SumAggregator
),将当前的缩减值与传入的记录合并。 The result of this operation is then stored in the operator's ValueState
and output to down stream consumers. 然后,此操作的结果存储在运算符的
ValueState
并输出到下游使用者。
For example: The input stream 1, 2, 3, 4, 5
will generate the following output when being summed: 1, 3, 5, 9, 14
. 例如:输入流
1, 2, 3, 4, 5
求和时将生成以下输出: 1, 3, 5, 9, 14
。
If you want, then you can implement the same behaviour with keyBy(0).process(...)
. 如果需要,可以使用
keyBy(0).process(...)
实现相同的行为。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.