spark.sql.shuffle.partitions 究竟指的是什么？

Question

What exactly does spark.sql.shuffle.partitions refer to? spark.sql.shuffle.partitions指的是什么？ Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?我们是在谈论作为宽转换结果的分区数量，还是在中间发生的事情，例如在宽转换的结果分区之前的某种中间分区？

Because in my understanding, as per a wide transformation we have因为在我看来，根据广泛的转变，我们有

Parents RDDs -> shuffle files -> Child RDDs

What does the spark.sql.shuffle.partitions parameter refer to here? spark.sql.shuffle.partitions 参数在这里指的是什么？ The shuffles files or the CHILD RDDs or something else that I ignored? shuffles文件或CHILD RDD或其他我忽略的东西？

Answer 1

This is already explained in the official docs :这已经在官方文档中解释过：

spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations. spark.sql.shuffle.partitions 200 配置混洗数据以进行连接或聚合时使用的分区数。

In other words it is the number of partitions of the child Dataset .换句话说，它是子Dataset的分区数。

spark.sql.shuffle.partitions 究竟指的是什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-09-24 09:27:15

spark.sql.shuffle.partitions 究竟指的是什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-09-24 09:27:15

解决方案1
1 已采纳 2018-09-24 09:27:15