简体繁体 English

从Python运行Spark时Java堆大小错误

[英]Java heap size error when running Spark from Python

原文 2016-10-26 15:30:32 8 1 java/ python/ apache-spark/ pyspark

I'm trying to run a Python script with the pyspark library. 我正在尝试使用pyspark库运行Python脚本。 I create a SparkConf() object using the following commands: 我使用以下命令创建一个SparkConf()对象：

conf = SparkConf().setAppName('test').setMaster(<spark-URL>)

When I run the script, that line runs into an error: 当我运行脚本时，该行会出现错误：

Picked up _JAVA_OPTIONS: -Xmx128m 提起_JAVA_OPTIONS：-Xmx128m

Picked up _JAVA_OPTIONS: -Xmx128m 提起_JAVA_OPTIONS：-Xmx128m

Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size. VM初始化期间发生错误初始堆大小设置为大于最大堆大小的值。

I tried to fix the problem by setting the configuration property spark.driver.memory to various values, but nothing changed. 我试图通过将配置属性spark.driver.memory设置为各种值来解决此问题，但未进行任何更改。

What is the problem and how can I fix it? 有什么问题，我该如何解决？

Thanks 谢谢

1 个解决方案

This is because you're setting the maximum available heap size (128M) to be larger than the initial heap size error. 这是因为您将最大可用堆大小（128M）设置为大于初始堆大小错误。 Check the _JAVA_OPTIONS parameter that you're passing or setting in the configuration file. 检查您正在传递或在配置文件中设置的_JAVA_OPTIONS参数。 Also, note that the changes in the spark.driver.memory won't have any effect because the Worker actually lies within the driver JVM process that is started on starting spark-shell and the default memory used for that is 512M. 另外，请注意spark.driver.memory中的更改不会产生任何影响，因为Worker实际上位于在启动spark-shell时启动的驱动程序JVM进程内，并且用于该进程的默认内存为512M。

This creates a conflict as spark tries to initialize a heap size equal to 512M, but the maximum allowed limit set by you is only 128M. 当spark尝试初始化等于512M的堆大小时，这会产生冲突，但是您设置的最大允许限制仅为128M。

You can set the minimum heap size through the --driver-java-options command line option or in your default properties file 您可以通过--driver-java-options命令行选项或在默认属性文件中设置最小堆大小