简体   繁体   中英

adding a sub query to a case statement in hive

I hope you can help. I have the below query, which has a case statement.

I want to say:

IF the domain is in the other table, then return the domain name, else, mark it as 'other'

I am using Hive & get the error:

Unsupported SubQuery Expression 'cleandomain': Currently SubQuery expressions are only allowed as Where Clause predicates

Is there some other way I can achieve the same?

SELECT *,
       CASE
         WHEN cleandomain IN (SELECT cleandomain
                              FROM   keenek1.daily_top_doms) THEN cleandomain
         ELSE 'other'
       END AS status
FROM   (SELECT hour,.....

One possible solution is using in_file(string str, string filename) function.

Put the list of domains in the text file, one domain per line, txt file and call in_file function in the CASE statement:

  CASE
     WHEN in_file(cleandomain,'file/path/daily_top_doms.txt') THEN cleandomain
     ELSE 'other'
   END AS status

Another solution is to aggregate the list of domains into array in the subquery, join using cross join and use array_contains(). This may work much faster if the list is not too big:

with dom as (
SELECT collect_set(cleandomain) dom
  FROM   keenek1.daily_top_doms
)

select 
case when array_contains(d.dom, s.cleardomain) then s.cleandomain
         else 'other'
 end as status
from (your query) s cross join dom d --one row cross join

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM