简体   繁体   中英

Spark Shuffle Memory Overhead Issues

I have some recurring problems in designing Spark Jobs (using Spark 2.3.x).

In a nutshell:

  • job is essentially some expensive shuffle operation ( .groupby or .join operations on large dataframes with fine granularity). Afterwards results are written to disk (parquet)
  • most tasks succeed very quickly
  • there are few, hard to solve tasks, that take very long and sometimes fail
  • even if the job succeeds the few long task make up for the majority of runtime
  • Yarn occasionally kills some executors because they exceed memory limits yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 18.0 GB of 18 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 18.0 GB of 18 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
  • Yarn occasionally kills the job after those tasks failed multiple times org.apache.spark.SparkException: Job aborted due to stage failure: Task 175 in stage 92.0 failed 4 times

I wonder how single tasks can have such a high memory consumption. In my understanding of how Spark work, it should be possible to make the tasks sufficiently small so that they fit into memory. Also the fact that few tasks make up for the majority of runtime is a sign of sub-optimal parallelization. The data within a grouping unit (group -> all that matches key for groupBy or join) is not very large. (Aggregation of a single group key cannot cause the memory issues alone)

Things I already tried:

  • increased executor memory and memory overhead -> reduced failrate, increased runtime, but also did not resolve the issues, I run into the next limits. also there are hardware restrictions
  • changed partitioning of my DataFrames -> no visible effects
  • increased shuffle service partitions spark.sql.shuffle.partitions -> reduced the fail rate, but also increased runtime

Any ideas to improve performance & stability?

edit:

Some further investigation revealed, we have indeed very skewed datasets. It seems that the map operations for some few very large rows are very much to large for the spark-executors to handle.

i increased shuffle partition count. i massively increased executor memory. and i changed the configuration settings that were recommended in the spark error logs. for now the job runs without warnings/ errors, but the runtime is severely increased.

--executor-memory 32g
--driver-memory 16g
--conf spark.executor.memoryOverhead=8g
--conf spark.driver.maxResultSize=4g
--conf spark.sql.shuffle.partitions=3000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM