how to simplified the calculate efficiency with hive?

Question

The code is running on hive:

select day,count(mdn)*5 as number from
(select distinct a.mdn,a.day from 
flow a
left outer join
flow b
on a.day=date_add(b.day,-1) and a.mdn=b.mdn
left outer join
flow c
on a.day=date_add(c.day,-2) and a.mdn=c.mdn
left outer join
flow d
on a.day=date_add(d.day,-3) and a.mdn=d.mdn
where b.mdn is null  and c.mdn is null  and d.mdn is null)t 
group by day

The logic of code is that select the one mdn today who is not appeared in future three days, and calculate the number of mdn.But the efficiency of this code is so low because of three times join with the same big table flow. How to simplify it with high efficiency?

Answer 1

Well, you can look at the next day using lead() and compare the date times:

select f.*
from (select f.*,
             lead(f.day) over (partition by f.mdn order by f.day) as next_day
      from flow f
     ) f
where next_day > date_add(day, 3) or next_date is null;

how to simplified the calculate efficiency with hive?

Question

1 answers

solution1
1 ACCPTED 2018-03-01 02:53:32

how to simplified the calculate efficiency with hive?

Question

1 answers

solution1 1 ACCPTED 2018-03-01 02:53:32

solution1
1 ACCPTED 2018-03-01 02:53:32