简体   繁体   English

用于连接性能改进的 Spark-SQL 查询提示

[英]Spark-SQL Query Hints for Join Performance Improvement

I have recently been introduced to SparkSQL .我最近被介绍给SparkSQL We use Spark 2.4 .我们使用Spark 2.4 I recently found out that SparkSQL query supports the following hints for its Join strategies:我最近发现SparkSQL查询的 Join 策略支持以下提示:

  • BROADCAST hint广播提示
  • MERGE hint合并提示
  • SHUFFLE_HASH hint SHUFFLE_HASH 提示

Unfortunately, I have not found any online materials which elaborately discuss these hints and their application scenarios.不幸的是,我没有找到任何在线资料详细讨论这些提示及其应用场景。 I wish to learn some tips regarding when to use these hints in a query Join for improving query performance.我希望了解一些关于何时在查询 Join 中使用这些提示以提高查询性能的技巧。

Can anyone explain with some examples.任何人都可以用一些例子来解释。 Any help is appreciated.任何帮助表示赞赏。 Thanks谢谢

  1. Broadcast join is very high performance join with sending data of the small table to every executor to execute a map-side join.广播连接是一种非常高性能的连接,它将小表的数据发送给每个执行器以执行映射侧连接。 here is the configuration:spark.sql.autoBroadcastJoinThreshold这里是配置:spark.sql.autoBroadcastJoinThreshold
  2. Sort-merge join is a default join choice after spark 2.3排序合并连接是 spark 2.3 之后的默认连接选择

there are some post,Hope it help you: Spark SQL Joins Sort-Merge Join有一些帖子,希望对你有所帮助: Spark SQL Joins Sort-Merge Join

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM