The code is running on hive:
select day,count(mdn)*5 as number from
(select distinct a.mdn,a.day from
flow a
left outer join
flow b
on a.day=date_add(b.day,-1) and a.mdn=b.mdn
left outer join
flow c
on a.day=date_add(c.day,-2) and a.mdn=c.mdn
left outer join
flow d
on a.day=date_add(d.day,-3) and a.mdn=d.mdn
where b.mdn is null and c.mdn is null and d.mdn is null)t
group by day
The logic of code is that select the one mdn today who is not appeared in future three days, and calculate the number of mdn.But the efficiency of this code is so low because of three times join with the same big table flow. How to simplify it with high efficiency?
Well, you can look at the next day using lead()
and compare the date times:
select f.*
from (select f.*,
lead(f.day) over (partition by f.mdn order by f.day) as next_day
from flow f
) f
where next_day > date_add(day, 3) or next_date is null;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.