简体   繁体   English

Apache Spark:java.lang.OutOfMemoryError:Java 堆空间问题

[英]Apache Spark : java.lang.OutOfMemoryError: Java Heap Space issue

I am facing the java.lang.OutOfMemoryError: Java Heap Space issue when I run the same spark program every 2nd time.当我每 2 次运行相同的 spark 程序时,我面临java.lang.OutOfMemoryError: Java Heap Space问题。

Here is a scenario:这是一个场景:

When I do the spark-submit and runs the spark program for the first time, it gives me the correct output & everything is fine.当我第一次执行spark-submit并运行 spark 程序时,它给了我正确的 output 并且一切都很好。 When I execute the same spark-submit one more time, it is throwing java.lang.OutOfMemoryError: Java Heap Space exception.当我再次执行相同的spark-submit时,它会抛出java.lang.OutOfMemoryError: Java Heap Space异常。

When it again works?什么时候再次起作用?

If I run the same spark-submit after clearing the linux cache by executing - /proc/sys/vm/drop_caches it again runs successfully for one single time.如果我在通过执行 - /proc/sys/vm/drop_caches清除 linux 缓存后运行相同的spark-submit ,它会再次成功运行一次。

I tried setting all possible spark configs like memoryOverhead, drive-memory, executor-memory, etc.我尝试设置所有可能的 spark 配置,例如 memoryOverhead、drive-memory、executor-memory 等。

Any idea whats happening here?知道这里发生了什么吗? Is this really a problem with spark code, or its happening because of some linux machine setting or the way cluster is configured?这真的是火花代码的问题,还是因为某些 linux 机器设置或集群配置方式而发生?

Thanks.谢谢。

In case of using df.persist() or df.cache() then you should be also using df.unpersist() method and there's also sqlContext.clearCache() which clears all.在使用df.persist()df.cache()的情况下,您还应该使用df.unpersist()方法,并且还有sqlContext.clearCache()可以清除所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM