spark on yarn cluster creates a spark job with the number of workers that is much smaller than what is specified in the spark context

Question

Spark on yarn cluster creates a spark job with the number of workers that is much smaller (only 4 workers) than what is specified in the spark context (100): here is how I create the spark context and session:

config_list = [
    ('spark.yarn.dist.archives','xxxxxxxxxxx'),
    ('spark.yarn.appMasterEnv.PYSPARK_PYTHON','xxxxxxxxx'),
    ('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON','xxxxxxxxxxx'),
    ('spark.local.dir','xxxxxxxxxxxxxxxxxx'),
    ('spark.submit.deployMode','client'),
    ('spark.yarn.queue','somequeue'),
    ('spark.dynamicAllocation.minExecutors','100'),
    ('spark.dynamicAllocation.maxExecutors','100'),
    ('spark.executor.instances','100'),
    ('spark.executor.memory','40g'),
    ('spark.driver.memory','40g'),
    ('spark.yarn.executor.memoryOverhead','10g')
]

conf = pyspark.SparkConf().setAll(config_list)

spark = SparkSession.builder.master('yarn')\
    .config(conf=conf)\
    .appName('myapp')\
    .getOrCreate()

sc = spark.sparkContext

would appreciate any ideas

Answer 1

The spark session will allocate maximum amount of free worker nodes available at the time of running of your job if you specify the minimum worker nodes to be greater than equal to your actual workers/executors present in your cluster.

You can also verify this by seeing the number of executors allocated in the session by using below:

sc._conf.get('spark.executor.instances')

I hope you understand

spark on yarn cluster creates a spark job with the number of workers that is much smaller than what is specified in the spark context

Question

1 answers

solution1
0 ACCPTED 2020-02-26 01:14:59

spark on yarn cluster creates a spark job with the number of workers that is much smaller than what is specified in the spark context

Question

1 answers

solution1 0 ACCPTED 2020-02-26 01:14:59

solution1
0 ACCPTED 2020-02-26 01:14:59