简体   繁体   中英

Spark-SQL Query Hints for Join Performance Improvement

I have recently been introduced to SparkSQL . We use Spark 2.4 . I recently found out that SparkSQL query supports the following hints for its Join strategies:

  • BROADCAST hint
  • MERGE hint
  • SHUFFLE_HASH hint

Unfortunately, I have not found any online materials which elaborately discuss these hints and their application scenarios. I wish to learn some tips regarding when to use these hints in a query Join for improving query performance.

Can anyone explain with some examples. Any help is appreciated. Thanks

  1. Broadcast join is very high performance join with sending data of the small table to every executor to execute a map-side join. here is the configuration:spark.sql.autoBroadcastJoinThreshold
  2. Sort-merge join is a default join choice after spark 2.3

there are some post,Hope it help you: Spark SQL Joins Sort-Merge Join

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM