简体   繁体   English

如何解决 EMR Spark Out Of Memory 错误

[英]How to resolve EMR Spark Out Of Memory Error

I have a Spark job which I am trying to execute on EMR.我有一个要在 EMR 上执行的 Spark 作业。 It is giving me the below error:它给我以下错误:

java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="kill -9 %p"
Executing /bin/sh -c "kill -9 22611"...

I have tried it with even 10 core instances of type m5.12xlarge but still the same issue.我什至对 10 个 m5.12xlarge 类型的核心实例进行了尝试,但仍然存在相同的问题。 My code is working fine as I have tested it via AWS Glue and that has succeeded with G1.X and 20 DPUs (takes around 3 hours to complete the job).我的代码运行良好,因为我已经通过 AWS Glue 对其进行了测试,并且在 G1.X 和 20 个 DPU 上取得了成功(大约需要 3 个小时才能完成工作)。 Any recommendation regarding how I choose the EMR instance type?关于我如何选择 EMR 实例类型的任何建议?

So changing the instance types only won't always help, we need to play with spark configurations as well.因此,仅更改实例类型并不总是有帮助,我们还需要使用 spark 配置。 I followed something like mentioned here and the job is successful on EMR.我遵循了此处提到的内容,并且在 EMR 上成功完成了工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM