简体   繁体   中英

Comparing efficiency of Hive queries with different join orders

Consider the following two queries in Hive:

SELECT
    *
FROM
    A
INNER JOIN 
    B
INNER JOIN
    C
ON 
    A.COL = B.COL
AND A.COL = C.COL

and

SELECT
    *
FROM
    A
INNER JOIN
    B
ON
    A.COL = B.COL
INNER JOIN
    C
ON
    A.COL = C.COL

Question : Are the two queries computationally same or different? In other words, to get the fastest results should I prefer to write one versus the other, or it doesn't matter? Thanks.

On Hive 1.2, also tested on Hive 2.3, both on Tez, the optimizer is intelligent enough to derive ON condition for join with table B and performs two INNER JOINs each with correct it's own ON condition.

Checked on simple query

with A as (
select stack(3,1,2,3) as id
),
B as (
select stack(3,1,2,3) as id
),
C as (
select stack(3,1,2,3) as id
)

select * from A 
inner join B
inner join C
ON A.id = B.id AND A.id = C.id

Explain command shows that both joins are executed as map-join on single mapper and each join has it's own join condition. This is explain output:

Map 1 File Output Operator [FS_17] Map Join Operator [MAPJOIN_27] (rows=1 width=12) Conds: FIL_24.col0=RS_12.col0(Inner) , FIL_24.col0=RS_14.col0(Inner) ,HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]

First I thought that it will be CROSS join with table B in first query, then join with C will reduce the dataset, but both queries work the same(the same plan, the same execution), thanks to the optimizer.

Also I tested the same with map-join switched off ( set hive.auto.convert.join=false; ) and also got exactly the same plan for both queries. I did not test it for really big tables, you better double-check.

So, computationally both are the same on Hive 1.2 and Hive 2.3 for map-join and merge join on reducer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM