简体   繁体   中英

spark on yarn cluster creates a spark job with the number of workers that is much smaller than what is specified in the spark context

Spark on yarn cluster creates a spark job with the number of workers that is much smaller (only 4 workers) than what is specified in the spark context (100): here is how I create the spark context and session:

config_list = [
    ('spark.yarn.dist.archives','xxxxxxxxxxx'),
    ('spark.yarn.appMasterEnv.PYSPARK_PYTHON','xxxxxxxxx'),
    ('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON','xxxxxxxxxxx'),
    ('spark.local.dir','xxxxxxxxxxxxxxxxxx'),
    ('spark.submit.deployMode','client'),
    ('spark.yarn.queue','somequeue'),
    ('spark.dynamicAllocation.minExecutors','100'),
    ('spark.dynamicAllocation.maxExecutors','100'),
    ('spark.executor.instances','100'),
    ('spark.executor.memory','40g'),
    ('spark.driver.memory','40g'),
    ('spark.yarn.executor.memoryOverhead','10g')
]

conf = pyspark.SparkConf().setAll(config_list)

spark = SparkSession.builder.master('yarn')\
    .config(conf=conf)\
    .appName('myapp')\
    .getOrCreate()

sc = spark.sparkContext

would appreciate any ideas

The spark session will allocate maximum amount of free worker nodes available at the time of running of your job if you specify the minimum worker nodes to be greater than equal to your actual workers/executors present in your cluster.

You can also verify this by seeing the number of executors allocated in the session by using below:

sc._conf.get('spark.executor.instances')

I hope you understand

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM