简体   繁体   English

Google Dataflow / Dataprep Shuffle键太大(INVALID_ARGUMENT)

[英]Google Dataflow / Dataprep Shuffle key too large (INVALID_ARGUMENT)

I have tried running this job several times, and each time after hitting many quota related warnings (and requesting an increase each time) but in the end it always ends up failing with this error message, which I believe is caused by my dataset being too large, but I'm not sure. 我已尝试多次运行此作业,并且每次遇到许多与配额相关的警告(并且每次请求增加)但最终它总是失败并出现此错误消息,我认为这是由我的数据集引起的很大,但我不确定。 Dataprep is supposed to be able to handle ETL jobs of any scale, and this isn't even that large of a job. Dataprep应该能够处理任何规模的ETL作业,这甚至不是那么大的工作。 Anyway, here is the error message, any help would be appreciated: 无论如何,这是错误消息,任何帮助将不胜感激:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: java.io.IOException: INVALID_ARGUMENT: Shuffle key too large:2001941 > 1572864
at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:182)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:104)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowViaIteratorsFn.processElement(BatchGroupAlsoByWindowViaIteratorsFn.java:121)
at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowViaIteratorsFn.processElement(BatchGroupAlsoByWindowViaIteratorsFn.java:53)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:117)
...

Full error message can be found here: https://pastebin.com/raw/QTtmm5D2 完整的错误消息可以在这里找到: https//pastebin.com/raw/QTtmm5D2

I have gotten several quota increases, and while that lets the job continue father than before, it still ends in the same error (although the shuffle key size is larger.) It now doesn't appear to be hitting a wall due to a quota related issue. 我已经获得了几个配额增加,虽然这使得工作继续比之前的父亲,它仍然以相同的错误结束(虽然随机密钥大小更大。)现在似乎没有因配额而撞墙相关问题。

Any ideas short of ditching Dataprep and going back to map reduce? 任何缺乏放弃Dataprep并返回地图的想法都会减少?

This looks to me more likely to be an error where a single value in a single column is too large, rather than that the dataset is too large. 这看起来更像是一个错误,其中单个列中的单个值太大,而不是数据集太大。 Do you have columns with values this long? 你有这么久的值列吗? (about 2MB here apparently) (显然这里约2MB)

That said, I think this should be reported as a bug to Dataprep. 也就是说,我认为这应该被报告为Dataprep的一个错误。 It seems that they perform a group by column values, and they probably should trim them to a smaller size before grouping. 看起来他们按列值执行分组,并且他们可能应该在分组之前将它们修剪为更小的尺寸。 I don't know whether they are following StackOverflow. 我不知道他们是否关注StackOverflow。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GCP Dataflow 抛出异常 Shuffle key too large - GCP Dataflow throws exception Shuffle key too large gcp 数据流模板,错误:(gcloud.beta.dataflow.jobs.run)INVALID_ARGUMENT:环境版本不支持作业类型 - gcp dataflow templates, ERROR: (gcloud.beta.dataflow.jobs.run) INVALID_ARGUMENT: There is no support for job type with environment version Google DocumentAI Java 示例因 io.grpc.StatusRuntimeException 失败:INVALID_ARGUMENT:请求包含无效参数 - Google DocumentAI Java example fails with io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request contains an invalid argument 尝试使用Google Cloud Vision时出现INVALID_ARGUMENT - I get INVALID_ARGUMENT when I try to use Google Cloud Vision Unity 广告返回 INVALID_ARGUMENT - Unity ads returns INVALID_ARGUMENT 通过 Java API 创建 Google Cloud Function:“io.grpc.StatusRuntimeException: INVALID_ARGUMENT” - Creating Google Cloud Function via Java API: “io.grpc.StatusRuntimeException: INVALID_ARGUMENT” 作业图太大,无法提交到 Google Cloud Dataflow - Job graph too large to submit to Google Cloud Dataflow 跟进意图更新时出现Dialogflow INVALID_ARGUMENT错误 - Dialogflow INVALID_ARGUMENT error on followup intent update 使用 IamCredentialsClient 对 Blob 进行签名时的 INVALID_ARGUMENT 响应 - INVALID_ARGUMENT response when signingBlob using IamCredentialsClient Dataprep-输出为BigQuery时,数据流失败 - Dataprep - Dataflow fails when output is BigQuery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM