简体   繁体   English

GCP Dataflow 抛出异常 Shuffle key too large

[英]GCP Dataflow throws exception Shuffle key too large

I have a piece of code that groups my data but it throws an exception when I do output.我有一段代码对我的数据进行分组,但是当我执行 output 时它会引发异常。

This class is used as key in KV此 class 用作 KV 中的密钥

class CKey {
    private Long id;
    private Long subId;
}

It's part of my Dataflow job这是我的数据流工作的一部分

TupleTag<CItem> itemsTuple = //...
TupleTag<CMeta> metaTuple = //...

//...

PCollection<KV<CKey, CItem>> items = null;
PCollection<KV<CKey, CMeta>> meta;

KeyedPCollectionTuple.of(itemsTuple, items).and(metaTuple, meta.next())
        .apply(CoGroupByKey.create())
        .apply(new CustomGroupPairsFn());

Custom function to join data自定义function加入数据

class CustomGroupPairsFn extends DoFn<KV<CKey, CoGbkResult>, MyCustomObject> {

        @ProcessElement
        public void processElement(@Element KV<CKey, CoGbkResult> element, OutputReceiver<MyCustomObject> out) {
            CoGbkResult pair = element.getValue();
            Iterator<CItem> citem = pair.getAll(ITEMS).iterator();
            Iterator<CMeta> cmeta = pair.getAll(METADATA).iterator();
            try {
                out.output(new MyCustomObject(citem.next(), cmeta));
            } catch (Exception e) {
                log.error("Error occurred", e);
            }
        }
    }

There is only 1 line of code in try and exception is thrown inside, exception: try里面只有 1 行代码,里面抛出异常,异常:

在此处输入图像描述

How can I fix the issue?我该如何解决这个问题?

This error happens because you're shuffling a key that is too large.发生此错误是因为您正在改组一个太大的键。

What does this mean?这是什么意思? In Dataflow, the largest shuffle key allowed is 1.5 MB for streaming pipelines.在 Dataflow 中,流管道允许的最大 shuffle key 为 1.5 MB。 You seem to have an element key that is larger than that.您似乎有一个比这更大的元素键。

Perhaps your pipeline has a GroupByKey/Shuffling operation somewhere unexpected, so that's why it would be great to have more details about it.也许您的管道在某个意想不到的地方有一个 GroupByKey/Shuffle 操作,这就是为什么最好有更多关于它的细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM