How to implement it in hadoop?
In hive, I have a table with lots columns, which two of them are begin_time, end_time.
I need to count the number on the each time
a piece of the table is this:
begin_time end_time
2011.04.26 10:19:06^A2011.04.26 10:20:22
2011.04.26 10:19:08^A2011.04.26 10:21:49
2011.04.26 10:19:08^A2011.04.26 11:18:46
2011.04.26 10:19:09^A2011.04.26 12:08:36
2011.04.26 10:19:09^A2011.04.26 11:00:16
2011.04.26 10:19:11^A2011.04.26 10:19:17
2011.04.26 10:19:12^A2011.04.26 10:46:21
2011.04.26 10:19:13^A2011.04.26 10:55:43
2011.04.26 10:19:17^A2011.04.26 10:19:41
2011.04.26 10:19:18^A2011.04.26 10:34:41
the result I want is how many people is in on a specific time.
eg on 2011.04.26 10:19:08, there 3 visitor on course there one in 19:06, and 2 in 19:08.
and 2011.04.26 10:19:18 is 9, course ten but one leave on 2011.04.26 10:19:17
the desired result for piece is
2011.04.26 10:19:06 1
2011.04.26 10:19:08 3
2011.04.26 10:19:09 5
2011.04.26 10:19:11 6
2011.04.26 10:19:12 7
2011.04.26 10:19:13 8
2011.04.26 10:19:17 9
2011.04.26 10:19:18 9
Any help is much appreciated and welcome.
You can try this on hive (assume the table name is test_log):
select /*+ MAPJOIN(driven) */ driven.time, count(*)
from
(select time
from
(select begin_time time from test_log union all
select end_time time from test_log) u
group by time) driven
join test_log l on true
where
driven.time between l.begin_time and l.end_time
group by driven.time
Probably not the best solution but at least it works. You can add some filter on the driven subquery to reduce the data set.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.