简体   繁体   中英

Hive: find unique ids that have certain events

I have a hive table with IDs and associated events that look like below. The table can have multiple event_number for the same ID -

ID    event_number    Date
ABC      1           2022-08-01
ABC      2           2022-08-01
ABC      3           2022-08-01
DEF      1           2022-08-01
GHI      2           2022-08-02
DEF      3           2022-08-01

I want to find unique ids that have events 1 and 2 in a day

  • Here the output would be ABC because that is the only ID with both event 1 and event 2 for a given date.
  • It cannot be DEF or GHI since they either have event 1 or 2

Here is the query I came up for this -

select distinct ID from table where event_number=1 and date=2022-08-01 
and ID in( Select ID from table where event_number=2 and date=2022-08-01);

Is there a more elegant or efficient way to do this?

First filer the records which are matching your event filter, then aggregate on dates and get the rows where the event_count is more than 1 for the given date. eg

select id,`date`,count(distinct event_number) event_count from (
select id,event_number,`date` from table where id in (1,2)
) a group by id,`date` having event_count>1;

The SQL as below:

select 
    id
from (
    select
        id,
        date,
        count(disintct event_number) as event_no_cnt
    from
        table 
    where
        event_number in (1,2)
    group by
        id,
        date
    having(count(disintct event_number)=2)
) tmp 
group by id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM