简体繁体中英

If I keep the total number of cores consistent, how should I choose the number of executors and number of cores per executor?

原文 2019-11-14 23:22:31 1 2 apache-spark/ parallel-processing/ executor

Suppose I'm working with a cluster with 2 i3.metal instances, which each have 512GiB of memory and 72 vCPU cores ( source ). If I want to use all of the cores, I need some configuration of executors and cores per executor that gives me 144 cores. There seem to be many options for this; for example, I could have 72 executors with 2 cores each, or I could have 36 executors with 4 cores each. Either way, I end up with the same number of cores and the same amount of memory per core.

How do I choose between these two configurations, or the many more that are available? Is there any functional difference between the two?

I have read Cloudera's blog post about parameter tuning for spark jobs, but it didn't answer this question. I have also searched SO for related posts, but again, didn't find an answer to this question.

The comments on the top answer in this post indicate that there isn't a single answer and it should be tuned for each job. If this is the case, I would appreciate any "general wisdom" that's out there!

2 answers

Indeed, there is no absolute answer for all use cases. Each job is different.

When I want to execute a new job, the general wisdom I am using is to start with a default configuration somewhere in the middle between thin and fat executors: several cores per executor, and several executors per machine.

I typically take the square root of the number of cores per machine for the cores per executor. And then, I fine-tune these parameters to the job, comparing performance, also looking at hardware bottlenecks (memory? cores? disk? network?). If the job fails, starting with subsets of the dataset and then scaling up helps, too.

So with this configuration, I would intuitively start with 18 executors (9 per machine) with 8 cores each, but 36 executors with 4 cores would also sound reasonable to me as an initial configuration.

Going for one core per (thin) executor, or one (fat) executor per node taking all cores of the machine tends to be inefficient for various reasons in terms of resources and bottlenecks.

Also, Spark has default caps on memory per executor. If there are few executors with lots of cores, it will under-utilize the memory unless you allocate more.

I hope this helps!

I would say 5 cores per executor would be a sweet spot to not cause any IO burden on your input data sources. Having said that also make sure that you are not having too less of memory per core. Ideally don't go less than 8g per executor.

Again as Ghislain mentioned, it depends on your operations but thats where I would start.

How spark manages IO perfomnce if we reduce the number of cores per executor and incease number of executors

How much should one choose the number of executors and cores on a spark-submit job?

Tuning Spark: number of executors per node when cores available are uneven

How to determine number of partitons of rdd in spark given the number of cores and executors ?

How to get number of executors and number of cores in Java spark

How to tune spark executor number, cores and executor memory?

Default number of executors and cores for spark-shell

Spark Standalone Number Executors/Cores Control

Spark coalesce relationship with number of executors and cores

How to increase the number of executors in Spark Standalone mode if spark.executor.instances and spark.cores.max aren't working

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How spark manages IO perfomnce if we reduce the number of cores per executor and incease number of executors How much should one choose the number of executors and cores on a spark-submit job? Tuning Spark: number of executors per node when cores available are uneven How to determine number of partitons of rdd in spark given the number of cores and executors ? How to get number of executors and number of cores in Java spark How to tune spark executor number, cores and executor memory? Default number of executors and cores for spark-shell Spark Standalone Number Executors/Cores Control Spark coalesce relationship with number of executors and cores How to increase the number of executors in Spark Standalone mode if spark.executor.instances and spark.cores.max aren't working

Related Tags

If I keep the total number of cores consistent, how should I choose the number of executors and number of cores per executor?

Question

2 answers

solution1
0 2019-11-15 07:23:46

solution2
0 2019-11-15 08:05:44

If I keep the total number of cores consistent, how should I choose the number of executors and number of cores per executor?

Question

2 answers

solution1 0 2019-11-15 07:23:46

solution2 0 2019-11-15 08:05:44

solution1
0 2019-11-15 07:23:46

solution2
0 2019-11-15 08:05:44