简体   繁体   English

Flink DataSet API:GroupBy无法正常工作吗?

[英]Flink DataSet API: Is GroupBy is not working correctly?

In my Flink Java program I use a GroupBy-Operator as follows: 在我的Flink Java程序中,我使用GroupBy-Operator,如下所示:

dataSet.groupBy(new KeySelector<myObject, Tuple2<Tuple2<Integer, Integer>, Integer>>() {
    private static final long serialVersionUID = 5L;
    Tuple2<Tuple2<Integer, Integer>, Integer> groupingKey = new Tuple2<Tuple2<Integer, Integer>, Integer>();

        public Tuple2<Tuple2<Integer, Integer>, Integer> getKey(myObject s) {
            groupingKey.setField(s.getPosition(), 0);
            groupingKey.setField(s.getBand(), 1);
            return groupingKey;
        }
    })
    .reduceGroup(reduceFunction);

getPosition() returns a Tuple2<Integer, Integer> and getBand() returns an int . getPosition()返回一个Tuple2<Integer, Integer>getBand()返回一个int

I want to group my dataset on both values. 我想在两个值上分组我的数据集。 If I have 6 positions and 4 bands I would like to get 24 distinct groups and use the groupReduce -function for every group independently. 如果我有6个位置和4个波段,我想获得24个不同的组,并且每个组独立使用groupReduce But currently my resulting groups seem to contain various values for the band and the position. 但是目前我的结果组似乎包含了乐队和位置的各种值。 I checked this like that in the groupReduce function: 我在groupReduce函数中检查了这个:

if (this.band == null) {
    this.band = myObject.getBand();
}
if (this.band != myObject.getBand()) {
    System.out.println("The band should be " + this.band + " but is: " + myObject.getBand());

Additionally there are also values in my resulting file which indicate a problem with the grouping. 此外,我的结果文件中还有值表示分组存在问题。 Is it possible that the grouping does not work in my case? 分组是否可能在我的情况下不起作用? Or could this just be a consequence of another potential bug in my code? 或者这可能是我的代码中另一个潜在错误的结果?

I think your check in the GroupReduceFunction is not working correctly. 我认为您在GroupReduceFunction的检查无法正常工作。 The GroupReduceFunction.reduce() can be called several times for different groups. 可以针对不同的组多次调用GroupReduceFunction.reduce() this.band is a member variable of your GroupReduceFunction and I assume that you do not reset this.band at the end of the reduce() method. this.band是你的成员变量GroupReduceFunction ,我以为你不复位this.band在年底reduce()方法。

Hence, this.band will null only in the first call of reduce() . 因此, this.band仅在reduce()的第一次调用中为null At the beginning of the second call this.band will be initialized and won't be set to the band of the current group. 在第二次调用开始时, this.band将被初始化,不会被设置为当前组的频段。 Therefore, the following check will fail. 因此,以下检查将失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Flink:DataSet API外连接中的NullPointerException - Apache Flink: NullPointerException in DataSet API Outer Join Flink Table API 无法将 DataSet 转换为 DataStream - Flink Table API not able to convert DataSet to DataStream Apache Flink:如何在DataSet API中处理故障? - Apache Flink: How are failures handled in the DataSet API? Apache Flink Table API中是否不允许查询POJO数据集的超类型 - is it not allowed to query on supertype of POJO Dataset in Apache Flink Table API 在每次迭代之前使用 Apache Flink 中的 DataSet API 计算变量 - Computing variables before each iteration using the DataSet API in Apache Flink Google Direction API无法正常运作 - Google direction api not working correctly 使用 Flink 将数据集设置为 Kafka? 是否可以 - DataSet to Kafka Using Flink? Is it possible Spark 2.2.0 API:我应该更喜欢使用Groupby结合Aggregate的Dataset或使用ReduceBykey结合RDD的数据集 - Spark 2.2.0 API: Which one should i prefer Dataset with Groupby combined with aggregate or RDD with ReduceBykey Flink 中的 LeftOuterJoin (JAVA API) - LeftOuterJoin in Flink (JAVA API) PreferenceScreen中的ColorAccent在API &lt;= 22中无法正常工作 - ColorAccent in PreferenceScreen not working correctly in API <=22
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM