简体繁体 English

“spark.sql.shuffle.partitions”配置是否影响非 sql 洗牌？

[英]Is "spark.sql.shuffle.partitions" configuration affects non sql shuffling?

原文 2022-04-18 11:22:25 1 1 apache-spark/ apache-spark-sql/ partitioning/ shuffle

We don't have a lot of SQL in our Spark jobs (That is a problem I know but for now its a fact).我们的 Spark 作业中没有很多 SQL（我知道这是一个问题，但现在它是一个事实）。 I want to optimize the size and number of our Spark shuffle partitions to optimize our Spark usage.我想优化我们的 Spark 随机分区的大小和数量，以优化我们的 Spark 使用。 I saw in a lot of sources that setting spark.sql.shuffle.partitions is a good option.我在很多资源中看到设置spark.sql.shuffle.partitions是一个不错的选择。 But will it do any effect if we almost do not use spark SQL?但是如果我们几乎不用spark SQL会有什么影响吗？

1 个解决方案

Indeed spark.sql.shuffle.partitions has no effect on jobs defined through the RDD api.事实上spark.sql.shuffle.partitions对通过 RDD api 定义的作业没有影响。

The configuration you are looking for is spark.default.parallelism , according to the documentation :根据文档，您正在寻找的配置是spark.default.parallelism ：

Default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user.当用户未设置时，由连接、reduceByKey 和并行化等转换返回的 RDD 中的默认分区数。

spark.sql.shuffle.partitions的200个默认分区难题 - spark.sql.shuffle.partitions of 200 default partitions conundrum

spark.sql.shuffle.partitions 本地火花性能行为 - spark.sql.shuffle.partitions local spark performance behavior

我们可以在AWS Glue中设置spark.sql.shuffle.partitions吗？ - Can we set spark.sql.shuffle.partitions in AWS Glue?

如何动态选择spark.sql.shuffle.partitions - How to dynamically choose spark.sql.shuffle.partitions

如何在 pyspark 中设置动态 spark.sql.shuffle.partitions？ - How to set dynamic spark.sql.shuffle.partitions in pyspark?

如何将“spark.sql.shuffle.partitions”设置为自动 - How to set "spark.sql.shuffle.partitions" to auto

spark.sql.shuffle.partitions 究竟指的是什么？ - What does spark.sql.shuffle.partitions exactly refer to?

检查点后未使用新的spark.sql.shuffle.partitions值 - new spark.sql.shuffle.partitions value not used after checkpointing

为什么增加 spark.sql.shuffle.partitions 会导致 FetchFailedException - Why increasing spark.sql.shuffle.partitions will cause FetchFailedException

当 shuffle 分区大于 200 时会发生什么（数据帧中的 spark.sql.shuffle.partitions 200（默认情况下）） - what happens when shuffle partition is greater than 200( spark.sql.shuffle.partitions 200(by default) in dataframe)

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 spark.sql.shuffle.partitions的200个默认分区难题 - spark.sql.shuffle.partitions of 200 default partitions conundrum spark.sql.shuffle.partitions 本地火花性能行为 - spark.sql.shuffle.partitions local spark performance behavior 我们可以在AWS Glue中设置spark.sql.shuffle.partitions吗？ - Can we set spark.sql.shuffle.partitions in AWS Glue? 如何动态选择spark.sql.shuffle.partitions - How to dynamically choose spark.sql.shuffle.partitions 如何在 pyspark 中设置动态 spark.sql.shuffle.partitions？ - How to set dynamic spark.sql.shuffle.partitions in pyspark? 如何将“spark.sql.shuffle.partitions”设置为自动 - How to set "spark.sql.shuffle.partitions" to auto spark.sql.shuffle.partitions 究竟指的是什么？ - What does spark.sql.shuffle.partitions exactly refer to? 检查点后未使用新的spark.sql.shuffle.partitions值 - new spark.sql.shuffle.partitions value not used after checkpointing 为什么增加 spark.sql.shuffle.partitions 会导致 FetchFailedException - Why increasing spark.sql.shuffle.partitions will cause FetchFailedException 当 shuffle 分区大于 200 时会发生什么（数据帧中的 spark.sql.shuffle.partitions 200（默认情况下）） - what happens when shuffle partition is greater than 200( spark.sql.shuffle.partitions 200(by default) in dataframe)

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM