简体   繁体   中英

Filter condition in WHERE clause in HiveQL query does not work properly

When I run the following query through beeline, one of the filter conditions in WHERE clause ( array_item in (20, 30) ) does not seem to work correctly.

  • Hive version: 2.1.1
with sample as (
              select 1 as col, array(10)         as array_col
    union all select 2 as col, array(10, 20, 30) as array_col
    union all select 3 as col, array(20, 30)     as array_col
    union all select 4 as col, array(30)         as array_col
)

, targets as (
    select 2 as col_target
)

select
    col
    , array_col
    , array_item
from
    sample
    lateral view explode(array_col) a as array_item
where
    true
    and array_item in (20, 30)
    and sample.col in (select col_target from targets)

Although I was expecting the following result to be displayed,

col array_col   array_item
2   [10,20,30]  20
2   [10,20,30]  30

the actual resuls returned by the query was as follows, showing that the query failed to apply the condition array_item in (20, 30)

col array_col   array_item
2   [10,20,30]  10
2   [10,20,30]  20
2   [10,20,30]  30

For verification, I also ran the following query.
This seemed to be executed correctly and returned the result same as what I expected.

with sample as (
              select 1 as col, array(10)         as array_col
    union all select 2 as col, array(10, 20, 30) as array_col
    union all select 3 as col, array(20, 30)     as array_col
    union all select 4 as col, array(30)         as array_col
)

, targets as (
    select 2 as col_target
)

select
    col
    , array_col
    , array_item
from
    sample
    lateral view explode(array_col) a as array_item
where
    true
    and array_item in (20, 30)
    and sample.col in (2)  -- changed here: "2" is same as the result of subquery

The difference between the first and second query is whether I write the subquery or the result returned by the subquery in the WHERE clause.

So I guess both queries are logically same, but they return different results.

I don't know why the first query does not consider array_item in (20, 30) .

I did some testing in Hive4 Alpha2 using hadoop 3.3.1 and tez 0.10.2

Looks like some bug when hive.optimize.ppd is true for first query.

By setting hive.optimize.ppd=false first query also work same as 2nd version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM