简体   繁体   English

从Python运行Spark时Java堆大小错误

[英]Java heap size error when running Spark from Python

I'm trying to run a Python script with the pyspark library. 我正在尝试使用pyspark库运行Python脚本。 I create a SparkConf() object using the following commands: 我使用以下命令创建一个SparkConf()对象:

conf = SparkConf().setAppName('test').setMaster(<spark-URL>)

When I run the script, that line runs into an error: 当我运行脚本时,该行会出现错误:

Picked up _JAVA_OPTIONS: -Xmx128m 提起_JAVA_OPTIONS:-Xmx128m

Picked up _JAVA_OPTIONS: -Xmx128m 提起_JAVA_OPTIONS:-Xmx128m

Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size. VM初始化期间发生错误初始堆大小设置为大于最大堆大小的值。

I tried to fix the problem by setting the configuration property spark.driver.memory to various values, but nothing changed. 我试图通过将配置属性spark.driver.memory设置为各种值来解决此问题,但未进行任何更改。

What is the problem and how can I fix it? 有什么问题,我该如何解决?

Thanks 谢谢

This is because you're setting the maximum available heap size (128M) to be larger than the initial heap size error. 这是因为您将最大可用堆大小(128M)设置为大于初始堆大小错误。 Check the _JAVA_OPTIONS parameter that you're passing or setting in the configuration file. 检查您正在传递或在配置文件中设置的_JAVA_OPTIONS参数。 Also, note that the changes in the spark.driver.memory won't have any effect because the Worker actually lies within the driver JVM process that is started on starting spark-shell and the default memory used for that is 512M. 另外,请注意spark.driver.memory中的更改不会产生任何影响,因为Worker实际上位于在启动spark-shell时启动的驱动程序JVM进程内,并且用于该进程的默认内存为512M。

This creates a conflict as spark tries to initialize a heap size equal to 512M, but the maximum allowed limit set by you is only 128M. 当spark尝试初始化等于512M的堆大小时,这会产生冲突,但是您设置的最大允许限制仅为128M。

You can set the minimum heap size through the --driver-java-options command line option or in your default properties file 您可以通过--driver-java-options命令行选项或在默认属性文件中设置最小堆大小

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM