When I start Apache Spark 1.2.1
application on CentOS 6.5, I receive more than 100% load for executors in accordance to 'top' output and load average is significant more than amount of cores.
As a result I have high load on garbage collector.
spark.executor.cores=1
. spark.cores
. No any effect. Similar Ubuntu 14.04 setup with 4 physical cores (Intel i5) has no any issue, 1 core per executor.
Any idea how to fix this?
Application submission is performed from code with all needed properties set through System.setProperty
and then Spark configuration and context created. It is done the same way, the only possible difference could be Spark configuration properties set which is per-cluster but there is nothing special. Under Ubuntu with 4 cores i5 this leads to proper load with no more than 1 core used by each executor. Under CentOS 6.5 with 2x6 cores E5 I see more than one core used per executor. More, I tried to apply 4 cores i5 configuration to E5 and had no success.
spark-defaults.conf
file content (before spark version substitution which is currently 1.2.1):
spark.master=yarn-client
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs:///user/spark/applicationHistory
spark.yarn.historyServer.address=X.X.X.X:18088
spark.executor.memory=1650M
spark.executor.cores=1
spark.cores.max=4
spark.executor.instances=15
spark.shuffle.memoryFraction=0.2
spark.storage.memoryFraction=0.02
spark.yarn.jar=hdfs:///user/spark/share/lib/${spark.version}/spark-assembly.jar
Main problem here is I see 2 x 6 cores E5 has lower performance than linear 1 i5 x 4 cores. Yes, E5 is somewhat older but should be anyway notable more powerful. And yet analysed Spark History server UI on similar load on both clusters I see notable more time spent into GC on E5 cluster. Crazy state.
OK, at the end I have found:
So the resolution was:
When you see more than 100% load on CPU core under Spark executor you should first check your job GC logs for ineffective memory intensive operations or at least reduce lifetime of some objects by more aggressive release of resources.
You're requesting more executors than you have CPU cores, which, if you think about it, should be impossible. However the default YARN model is to consider only RAM as a limiting factor (using the DefaultResourceCalculator ), and it will happily share CPU cores between multiple "1 core" executor, effectively leading to a load superior to 1 on CPU cores. You can use the DominantresourceCalculator to avoid this :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.