[英]How to execute Spark programs with Dynamic Resource Allocation?
I am using spark-summit command for executing Spark jobs with parameters such as:我正在使用 spark-summit 命令执行带有参数的 Spark 作业,例如:
spark-submit --master yarn-cluster --driver-cores 2 \
--driver-memory 2G --num-executors 10 \
--executor-cores 5 --executor-memory 2G \
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Now i want to execute the same program using Spark's Dynamic Resource allocation.现在我想使用 Spark 的动态资源分配来执行相同的程序。 Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.
您能否帮助使用动态资源分配来执行 Spark 程序。
In Spark dynamic allocation spark.dynamicAllocation.enabled
needs to be set to true
because it's false
by default.在 Spark 动态分配中
spark.dynamicAllocation.enabled
需要设置为true
因为默认情况下它是false
。
This requires spark.shuffle.service.enabled
to be set to true
, as spark application is running on YARN.这需要将
spark.shuffle.service.enabled
设置为true
,因为 spark 应用程序在 YARN 上运行。 Check this link to start the shuffle service on each NodeManager in YARN .检查此链接以在 YARN 中的每个 NodeManager 上启动 shuffle 服务。
The following configurations are also relevant:以下配置也相关:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.dynamicAllocation.initialExecutors
These options can be configured to Spark application in 3 ways这些选项可以通过 3 种方式配置到 Spark 应用程序
1. From Spark submit with --conf <prop_name>=<prop_value>
1. 从 Spark 提交
--conf <prop_name>=<prop_value>
spark-submit --master yarn-cluster \
--driver-cores 2 \
--driver-memory 2G \
--num-executors 10 \
--executor-cores 5 \
--executor-memory 2G \
--conf spark.dynamicAllocation.minExecutors=5 \
--conf spark.dynamicAllocation.maxExecutors=30 \
--conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2. Inside Spark program with SparkConf
2.用
SparkConf
在Spark程序SparkConf
Set the properties in SparkConf
then create SparkSession
or SparkContext
with it在
SparkConf
设置属性,然后用它创建SparkSession
或SparkContext
val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....
3. spark-defaults.conf
usually located in $SPARK_HOME/conf/
3.
spark-defaults.conf
通常位于$SPARK_HOME/conf/
Place the same configurations in spark-defaults.conf
to apply for all spark applications if no configuration is passed from command-line as well as code.如果没有从命令行和代码传递配置,则在
spark-defaults.conf
放置相同的配置以应用于所有 spark 应用程序。
I just did a small demo with Spark's dynamic resource allocation.我刚刚用 Spark 的动态资源分配做了一个小演示。 The code is on my Github .
代码在我的Github 上。 Specifically, the demo is in this release .
具体来说,演示在此版本中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.