简体   繁体   English

如何使用动态资源分配执行 Spark 程序?

[英]How to execute Spark programs with Dynamic Resource Allocation?

I am using spark-summit command for executing Spark jobs with parameters such as:我正在使用 spark-summit 命令执行带有参数的 Spark 作业,例如:

spark-submit --master yarn-cluster --driver-cores 2 \
 --driver-memory 2G --num-executors 10 \
 --executor-cores 5 --executor-memory 2G \
 --class com.spark.sql.jdbc.SparkDFtoOracle2 \
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Now i want to execute the same program using Spark's Dynamic Resource allocation.现在我想使用 Spark 的动态资源分配来执行相同的程序。 Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.您能否帮助使用动态资源分配来执行 Spark 程序。

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.在 Spark 动态分配中spark.dynamicAllocation.enabled需要设置为true因为默认情况下它是false

This requires spark.shuffle.service.enabled to be set to true , as spark application is running on YARN.这需要将spark.shuffle.service.enabled设置为true ,因为 spark 应用程序在 YARN 上运行。 Check this link to start the shuffle service on each NodeManager in YARN .检查此链接以在 YARN 中的每个 NodeManager 上启动 shuffle 服务

The following configurations are also relevant:以下配置也相关:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

These options can be configured to Spark application in 3 ways这些选项可以通过 3 种方式配置到 Spark 应用程序

1. From Spark submit with --conf <prop_name>=<prop_value> 1. 从 Spark 提交--conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster \
    --driver-cores 2 \
    --driver-memory 2G \
    --num-executors 10 \
    --executor-cores 5 \
    --executor-memory 2G \
    --conf spark.dynamicAllocation.minExecutors=5 \
    --conf spark.dynamicAllocation.maxExecutors=30 \
    --conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 \
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2. Inside Spark program with SparkConf 2.用SparkConf在Spark程序SparkConf

Set the properties in SparkConf then create SparkSession or SparkContext with itSparkConf设置属性,然后用它创建SparkSessionSparkContext

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3. spark-defaults.conf usually located in $SPARK_HOME/conf/ 3. spark-defaults.conf通常位于$SPARK_HOME/conf/

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.如果没有从命令行和代码传递配置,则在spark-defaults.conf放置相同的配置以应用于所有 spark 应用程序。

Spark - Dynamic Allocation Confs Spark - 动态分配会议

I just did a small demo with Spark's dynamic resource allocation.我刚刚用 Spark 的动态资源分配做了一个小演示。 The code is on my Github .代码在我的Github 上 Specifically, the demo is in this release .具体来说,演示在此版本中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM