[英]Flink DataSet API: Is GroupBy is not working correctly?
In my Flink Java program I use a GroupBy-Operator as follows: 在我的Flink Java程序中,我使用GroupBy-Operator,如下所示:
dataSet.groupBy(new KeySelector<myObject, Tuple2<Tuple2<Integer, Integer>, Integer>>() {
private static final long serialVersionUID = 5L;
Tuple2<Tuple2<Integer, Integer>, Integer> groupingKey = new Tuple2<Tuple2<Integer, Integer>, Integer>();
public Tuple2<Tuple2<Integer, Integer>, Integer> getKey(myObject s) {
groupingKey.setField(s.getPosition(), 0);
groupingKey.setField(s.getBand(), 1);
return groupingKey;
}
})
.reduceGroup(reduceFunction);
getPosition()
returns a Tuple2<Integer, Integer>
and getBand()
returns an int
. getPosition()
返回一个Tuple2<Integer, Integer>
, getBand()
返回一个int
。
I want to group my dataset on both values. 我想在两个值上分组我的数据集。 If I have 6 positions and 4 bands I would like to get 24 distinct groups and use the
groupReduce
-function for every group independently. 如果我有6个位置和4个波段,我想获得24个不同的组,并且每个组独立使用
groupReduce
。 But currently my resulting groups seem to contain various values for the band and the position. 但是目前我的结果组似乎包含了乐队和位置的各种值。 I checked this like that in the
groupReduce
function: 我在
groupReduce
函数中检查了这个:
if (this.band == null) {
this.band = myObject.getBand();
}
if (this.band != myObject.getBand()) {
System.out.println("The band should be " + this.band + " but is: " + myObject.getBand());
Additionally there are also values in my resulting file which indicate a problem with the grouping. 此外,我的结果文件中还有值表示分组存在问题。 Is it possible that the grouping does not work in my case?
分组是否可能在我的情况下不起作用? Or could this just be a consequence of another potential bug in my code?
或者这可能是我的代码中另一个潜在错误的结果?
I think your check in the GroupReduceFunction
is not working correctly. 我认为您在
GroupReduceFunction
的检查无法正常工作。 The GroupReduceFunction.reduce()
can be called several times for different groups. 可以针对不同的组多次调用
GroupReduceFunction.reduce()
。 this.band
is a member variable of your GroupReduceFunction
and I assume that you do not reset this.band
at the end of the reduce()
method. this.band
是你的成员变量GroupReduceFunction
,我以为你不复位this.band
在年底reduce()
方法。
Hence, this.band
will null
only in the first call of reduce()
. 因此,
this.band
仅在reduce()
的第一次调用中为null
。 At the beginning of the second call this.band
will be initialized and won't be set to the band of the current group. 在第二次调用开始时,
this.band
将被初始化,不会被设置为当前组的频段。 Therefore, the following check will fail. 因此,以下检查将失败。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.