简体   繁体   中英

Why does cloudera recommend choosing the number of executors, cores, and RAM they do in Spark

In the blog post:

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

I was going about it in the naive way:

given 16 cores, 64 RAM, 8 threads - use 15 cores, 63 RAM, 6 executors.

Instead they recommend 17 executors, 5 cores, and 19 RAM. I see that they have an equation for RAM but I have no idea what's going on.

Also does this still hold true if you are running it only on one machine(not through HDFS)?

Thanks for the help

I think they did a great job of explaining why here : (Look at the slides starting at slide 5). For example, they don't recommend more than 5 cores per executor because lots of cores can lead to poor HDFS I/O throughput.

They recommend deciding on RAM as follows: once you have the number of executors per node (in the article, it is 3), you take the total memory and divide by executors/node. So, you have 63 GB RAM / 3 executors per node = 21 GB (take a bit away and get 19 GB - not clear why they do this.).

You certainly have the right idea when it comes to leaving some resources for the application master / overhead!

These options are optimized for cluster computing. But this makes sense since Spark is a cluster computing engine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM