简体   繁体   English

“spark.sql.shuffle.partitions”配置是否影响非 sql 洗牌?

[英]Is "spark.sql.shuffle.partitions" configuration affects non sql shuffling?

We don't have a lot of SQL in our Spark jobs (That is a problem I know but for now its a fact).我们的 Spark 作业中没有很多 SQL(我知道这是一个问题,但现在它是一个事实)。 I want to optimize the size and number of our Spark shuffle partitions to optimize our Spark usage.我想优化我们的 Spark 随机分区的大小和数量,以优化我们的 Spark 使用。 I saw in a lot of sources that setting spark.sql.shuffle.partitions is a good option.我在很多资源中看到设置spark.sql.shuffle.partitions是一个不错的选择。 But will it do any effect if we almost do not use spark SQL?但是如果我们几乎不用spark SQL会有什么影响吗?

Indeed spark.sql.shuffle.partitions has no effect on jobs defined through the RDD api.事实上spark.sql.shuffle.partitions对通过 RDD api 定义的作业没有影响。

The configuration you are looking for is spark.default.parallelism , according to the documentation :根据文档,您正在寻找的配置是spark.default.parallelism

Default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user.当用户未设置时,由连接、reduceByKey 和并行化等转换返回的 RDD 中的默认分区数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 spark.sql.shuffle.partitions的200个默认分区难题 - spark.sql.shuffle.partitions of 200 default partitions conundrum spark.sql.shuffle.partitions 本地火花性能行为 - spark.sql.shuffle.partitions local spark performance behavior 我们可以在AWS Glue中设置spark.sql.shuffle.partitions吗? - Can we set spark.sql.shuffle.partitions in AWS Glue? 如何动态选择spark.sql.shuffle.partitions - How to dynamically choose spark.sql.shuffle.partitions 如何在 pyspark 中设置动态 spark.sql.shuffle.partitions? - How to set dynamic spark.sql.shuffle.partitions in pyspark? 如何将“spark.sql.shuffle.partitions”设置为自动 - How to set "spark.sql.shuffle.partitions" to auto spark.sql.shuffle.partitions 究竟指的是什么? - What does spark.sql.shuffle.partitions exactly refer to? 检查点后未使用新的spark.sql.shuffle.partitions值 - new spark.sql.shuffle.partitions value not used after checkpointing 为什么增加 spark.sql.shuffle.partitions 会导致 FetchFailedException - Why increasing spark.sql.shuffle.partitions will cause FetchFailedException 当 shuffle 分区大于 200 时会发生什么(数据帧中的 spark.sql.shuffle.partitions 200(默认情况下)) - what happens when shuffle partition is greater than 200( spark.sql.shuffle.partitions 200(by default) in dataframe)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM