简体   繁体   中英

java hadoop mapreduce implement count concurrency

How to implement it use hadoop mapreduce in java?

In hive, I have a table with lots columns, which two of them are begin_time, end_time.

I need to count the number on the each time

a piece of the table is this:

begin_time                  end_time
2011.04.26 10:19:06^A2011.04.26 10:20:22
2011.04.26 10:19:08^A2011.04.26 10:21:49
2011.04.26 10:19:08^A2011.04.26 11:18:46
2011.04.26 10:19:09^A2011.04.26 12:08:36
2011.04.26 10:19:09^A2011.04.26 11:00:16
2011.04.26 10:19:11^A2011.04.26 10:19:17
2011.04.26 10:19:12^A2011.04.26 10:46:21
2011.04.26 10:19:13^A2011.04.26 10:55:43
2011.04.26 10:19:17^A2011.04.26 10:19:41
2011.04.26 10:19:18^A2011.04.26 10:34:41

the result I want is how many people is in on a specific time.

eg on 2011.04.26 10:19:08, there 3 visitor on course there one in 19:06, and 2 in 19:08.

and 2011.04.26 10:19:18 is 9, course ten but one leave on 2011.04.26 10:19:17

the desired result for piece is

2011.04.26 10:19:06 1
2011.04.26 10:19:08 3
2011.04.26 10:19:09 5
2011.04.26 10:19:11 6
2011.04.26 10:19:12 7
2011.04.26 10:19:13 8
2011.04.26 10:19:17 9
2011.04.26 10:19:18 9

Any help is much appreciated and welcome.

In mapper you convert every record to two records - one for start time of record, one for end time. So if you have only one reducer you will get them sorted like:


Time                Begin/End
2011.04.26 10:19:06 B
2011.04.26 10:19:08 B
2011.04.26 10:19:08 B
2011.04.26 10:19:09 B
2011.04.26 10:19:09 B
2011.04.26 10:19:11 B
2011.04.26 10:19:12 B
2011.04.26 10:19:13 B
2011.04.26 10:19:17 E
2011.04.26 10:19:17 B
2011.04.26 10:19:18 B
2011.04.26 10:19:41 E
2011.04.26 10:20:22 E
2011.04.26 10:21:49 E
2011.04.26 10:34:41 E
2011.04.26 10:46:21 E
2011.04.26 10:55:43 E
2011.04.26 11:00:16 E
2011.04.26 11:18:46 E
2011.04.26 12:08:36 E

and can process them in this order by increasing number every time you encounter 'B' and decreasing when you encounter 'E'. To get your results you need to emit record from reducer when time changes and there was at least one 'B' for the time.


Time                Begin/End      N        Emit
2011.04.26 10:19:06 B              1          1
2011.04.26 10:19:08 B              2
2011.04.26 10:19:08 B              3          3
2011.04.26 10:19:09 B              4
2011.04.26 10:19:09 B              5          5
2011.04.26 10:19:11 B              6          6
2011.04.26 10:19:12 B              7          7
2011.04.26 10:19:13 B              8          8
2011.04.26 10:19:17 E              7
2011.04.26 10:19:17 B              8          8
2011.04.26 10:19:18 B              9          9
2011.04.26 10:19:41 E              8
2011.04.26 10:20:22 E 7 2011.04.26 10:21:49 E 6 2011.04.26 10:34:41 E 5 2011.04.26 10:46:21 E 4 2011.04.26 10:55:43 E 3 2011.04.26 11:00:16 E 2 2011.04.26 11:18:46 E 1 2011.04.26 12:08:36 E 0

If you know in advance time when there are no events, you can create partitioner which will split data in independent sets and you will be able to use multiple reducers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM