简体繁体中英

Spark-SQL Query Hints for Join Performance Improvement

原文 2020-06-08 23:02:05 4 1 apache-spark-sql/ query-hints

I have recently been introduced to SparkSQL . We use Spark 2.4 . I recently found out that SparkSQL query supports the following hints for its Join strategies:

BROADCAST hint
MERGE hint
SHUFFLE_HASH hint

Unfortunately, I have not found any online materials which elaborately discuss these hints and their application scenarios. I wish to learn some tips regarding when to use these hints in a query Join for improving query performance.

Can anyone explain with some examples. Any help is appreciated. Thanks

1 answers

Broadcast join is very high performance join with sending data of the small table to every executor to execute a map-side join. here is the configuration:spark.sql.autoBroadcastJoinThreshold
Sort-merge join is a default join choice after spark 2.3

there are some post,Hope it help you: Spark SQL Joins Sort-Merge Join

Performance Of Joins in Spark-SQL

sql/spark-sql: if statement syntax in a query

SparkException error for a Spark-SQL query

Working Around Performance & Memory Issues with spark-sql GROUP BY

Pure SQL way to save query output in Spark-sql

How to print SQL query in spark-sql CLI?

Spark-sql CLI use only 1 executor when running query

How to refer a map column in a spark-sql query?

spark-sql throws error on executing simple sq query

How to fix “Expressions referencing the outer query…” error in Spark-SQL?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Performance Of Joins in Spark-SQL sql/spark-sql: if statement syntax in a query SparkException error for a Spark-SQL query Working Around Performance & Memory Issues with spark-sql GROUP BY Pure SQL way to save query output in Spark-sql How to print SQL query in spark-sql CLI? Spark-sql CLI use only 1 executor when running query How to refer a map column in a spark-sql query? spark-sql throws error on executing simple sq query How to fix “Expressions referencing the outer query…” error in Spark-SQL?

Related Tags

Spark-SQL Query Hints for Join Performance Improvement

Question

1 answers

solution1 0 2020-06-09 01:14:19

solution1
0 2020-06-09 01:14:19