[英]What does spark.sql.shuffle.partitions exactly refer to?
What exactly does spark.sql.shuffle.partitions
refer to? spark.sql.shuffle.partitions
指的是什么? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?我们是在谈论作为宽转换结果的分区数量,还是在中间发生的事情,例如在宽转换的结果分区之前的某种中间分区?
Because in my understanding, as per a wide transformation we have因为在我看来,根据广泛的转变,我们有
Parents RDDs -> shuffle files -> Child RDDs
What does the spark.sql.shuffle.partitions parameter refer to here? spark.sql.shuffle.partitions 参数在这里指的是什么? The shuffles files or the CHILD RDDs or something else that I ignored?
shuffles文件或CHILD RDD或其他我忽略的东西?
This is already explained in the official docs :这已经在官方文档中解释过:
spark.sql.shuffle.partitions
200 Configures the number of partitions to use when shuffling data for joins or aggregations.spark.sql.shuffle.partitions
200 配置混洗数据以进行连接或聚合时使用的分区数。
In other words it is the number of partitions of the child Dataset
.换句话说,它是子
Dataset
的分区数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.