简体   繁体   中英

Spark 2 on YARN is utilizing more cluster resource automatically

I am on CDH 5.7.0 and I could see a strange issue with spark 2 running on YARN cluster. Hereunder is my job submit command

spark2-submit --master yarn --deploy-mode cluster --conf "spark.executor.instances=8" --conf "spark.executor.cores=4" --conf "spark.executor.memory=8g" --conf "spark.driver.cores=4" --conf "spark.driver.memory=8g" --class com.learning.Trigger learning-1.0.jar

Even though I have limited the number of cluster resources my job can use, I could see the resource utilization is more than the allocated amount.

The job starts with basic memory consumption like 8G of memory and would eat us the whole cluster.

I do not have dynamic allocation set to true. I am just triggering an INSERT OVERWRITE query on top of SparkSession .

Any pointers would be very helpful.

I created Resource Pool in cluster and assigned some resource as

Min Resources : 4 Virtual Cores and 8 GB memory

Used these pool to assign a spark job to limit the usages of resource (VCores and memory).

eg spark2-submit --class org.apache.spark.SparkProgram.rt_app --master yarn --deploy-mode cluster --queue rt_pool_r1 /usr/local/abc/rt_app_2.11-1.0.jar

If anyone has better options to archive the same please let us know.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM