简体   繁体   English

spark.sql.shuffle.partitions 究竟指的是什么?

[英]What does spark.sql.shuffle.partitions exactly refer to?

What exactly does spark.sql.shuffle.partitions refer to? spark.sql.shuffle.partitions指的是什么? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?我们是在谈论作为宽转换结果的分区数量,还是在中间发生的事情,例如在宽转换的结果分区之前的某种中间分区?

Because in my understanding, as per a wide transformation we have因为在我看来,根据广泛的转变,我们有

Parents RDDs -> shuffle files -> Child RDDs

What does the spark.sql.shuffle.partitions parameter refer to here? spark.sql.shuffle.partitions 参数在这里指的是什么? The shuffles files or the CHILD RDDs or something else that I ignored? shuffles文件CHILD RDD或其他我忽略的东西?

This is already explained in the official docs :这已经在官方文档中解释过:

spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations. spark.sql.shuffle.partitions 200 配置混洗数据以进行连接或聚合时使用的分区数。

In other words it is the number of partitions of the child Dataset .换句话说,它是子Dataset的分区数。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当 shuffle 分区大于 200 时会发生什么(数据帧中的 spark.sql.shuffle.partitions 200(默认情况下)) - what happens when shuffle partition is greater than 200( spark.sql.shuffle.partitions 200(by default) in dataframe) spark.sql.shuffle.partitions的200个默认分区难题 - spark.sql.shuffle.partitions of 200 default partitions conundrum spark.sql.shuffle.partitions 和 spark.default.parallelism 有什么区别? - What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? spark.sql.shuffle.partitions 本地火花性能行为 - spark.sql.shuffle.partitions local spark performance behavior “spark.sql.shuffle.partitions”配置是否影响非 sql 洗牌? - Is "spark.sql.shuffle.partitions" configuration affects non sql shuffling? spark.sql.shuffle.partitions 的最佳值应该是多少,或者在使用 Spark SQL 时我们如何增加分区? - What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL? 如何动态选择spark.sql.shuffle.partitions - How to dynamically choose spark.sql.shuffle.partitions 如何在 pyspark 中设置动态 spark.sql.shuffle.partitions? - How to set dynamic spark.sql.shuffle.partitions in pyspark? 如何将“spark.sql.shuffle.partitions”设置为自动 - How to set "spark.sql.shuffle.partitions" to auto 我们可以在AWS Glue中设置spark.sql.shuffle.partitions吗? - Can we set spark.sql.shuffle.partitions in AWS Glue?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM