简体   繁体   中英

How to change Spark setting to allow spark.dynamicAllocation.enabled?

I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined

I looked it up and found that the reason is that spark.dynamicAllocation.enabled is not allowed yet.

According to Spark's documentation ( https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled ): spark.dynamicAllocation.enabled (default: false ) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances is not set or is 0 (which is the default value).

Since the default setting is false , I need to change the Spark setting to enable spark.dynamicAllocation.enabled .

I installed Spark with brew, and didn't change its configuration/setting.

How can I change the setting and enable spark.dynamicAllocation.enabled ?

Thanks a lot.

Question : How can I change the setting and enable spark.dynamicAllocation.enabled?

There are 3 options through which you can achive this.
1) modify the parameters mentioned below in the spark-defaults.conf
2) sending the below parameters from --conf from your spark-submit
3) Programatically specifying the config of dynamic allocation as demonstrated below.

out of which programatically you can do this way You can do it in programmatic way like this.

val conf = new SparkConf()
      .setMaster("ClusterManager")
      .setAppName("test-executor-allocation-manager")
      .set("spark.dynamicAllocation.enabled", "true")
      .set("spark.dynamicAllocation.minExecutors", 1)
      .set("spark.dynamicAllocation.maxExecutors", 2)
      .set("spark.shuffle.service.enabled", "true") // for stand alone

There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:

conf.set("spark.dynamicAllocation.enabled","true")

If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be

/usr/hdp/current/spark-client/conf/

Add the setting to your spark-defaults.conf and should be good to go.

This is an issue that affects Spark installations made using other resources as well, such as the spark-ec2 script for installing on Amazon Web Services. From the Spark documentation, two values in SPARK_HOME/conf/spark-defaults.conf need to be set :

spark.shuffle.service.enabled   true
spark.dynamicAllocation.enabled true

see this: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

If your installation has a spark-env.sh script in SPARK_HOME/conf, make sure that it does not have lines such as the following, or that they are commented out:

export SPARK_WORKER_INSTANCES=1 #or some other integer, or
export SPARK_EXECUTOR_INSTANCES=1 #or some me other integer

可以使用以下类似命令在pyspark中通过笔记本设置配置参数:

spark.conf.set("spark.sql.crossJoin.enabled", "true")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM