為什么增加 spark.sql.shuffle.partitions 會導致 FetchFailedException

Question

在設置 spark.sql.shuffle.partitions = 2700 時加入表時遇到 FetchFailedException

但是設置 spark.sql.shuffle.partitions = 500 時運行成功。

據我所知，在隨機讀取時，增加 shuffle.partitions 會減少每個任務中的數據。

我錯過了什么嗎？

例外：

FetchFailed(BlockManagerId(699, nfjd-hadoop02-node120.jpushoa.com, 7337, None), shuffleId=4, mapId=59, reduceId=1140, message=
org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 2147483648, max: 2147483648)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:554)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:485)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCode

配置：

spark.executor.cores = 1
spark.dynamicAllocation.maxExecutors = 800

Answer 1

在閱讀了 shuffleFetch 的代碼之后。

我遇到的問題是來自 ShuffleMapTask 的真正塊太大而無法一次提取到 memory 中，並且來自驅動程序的塊大小是平均塊大小如果我的 shuffle 分區超過 2000（根據 spark.shuffle.minNumPartitionsToHighlyCompress ）會更小然后是傾斜數據時的實際大小。

為什么增加 spark.sql.shuffle.partitions 會導致 FetchFailedException

問題描述

1 個解決方案

解決方案1
0 2020-06-17 13:22:25

為什么增加 spark.sql.shuffle.partitions 會導致 FetchFailedException

問題描述

1 個解決方案

解決方案1 0 2020-06-17 13:22:25

解決方案1
0 2020-06-17 13:22:25