简体   繁体   中英

How to maintain variables in Hadoop?

I have few records say

session1    click1    time1 

session1    click2    time2

session1    click3    time3

session2    click1    time1

session2    click2    time2

session2    click3    time3

now I need to calculate visit time for each click in session

session1    click1    time1    (time2-time1)

session1    click2    time2    (time3-time2)

session1    click3    time3     0

session2    click1    time1    (time2-time1)

session2    click2    time2    (time3-time2)

session2    click3    time3    0

Which component of hadoop can I use to get the above functionality?

One possible solution is to use Map Reduce.

Map could emit Key,Value as SessionID, Click-Time pair. On the reducer end, sort the Click-Time pair by order of time. So you could easily get the first, second and third time of clicks. Rest is simple, just emit Key, Click, Time, Time Difference (each separated by delimiter tab) for each pair from the reducer. The value of the reducer could be NullWritable .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM