I'm working in a Spark project using MapR distribution where the dynamic allocation is enabled. Please refer to the below parameters :
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 0
spark.dynamicAllocation.maxExecutors 20
spark.executor.instances 2
As per my understanding spark.executor.instances is what we define as --num-executors while submitting our pySpark job.
I have following 2 questions :
if I use --num-executors 5
during my job submission will it overwrite spark.executor.instances 2
config setting?
what is the purpose of having spark.executor.instances
defined when dynamic allocation min and max executors are already defined?
There is one more parameter which is
spark.dynamicAllocation.initialExecutors
it takes the value of spark.dynamicAllocation.minExecutors
. If spark.executor.instances
is defined and its larger than the minExecutors then it will take the value of the initial executors.
spark.executor.instances
basically is the property for static allocation. However, if dynamic allocation is enabled, the initial set of executors will be at least equal to spark.executor.instances
.
It wont get overwritten in the config setting, when you set --num-executors.
Extra read: official doc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.