I have few records say
session1 click1 time1
session1 click2 time2
session1 click3 time3
session2 click1 time1
session2 click2 time2
session2 click3 time3
now I need to calculate visit time for each click in session
session1 click1 time1 (time2-time1)
session1 click2 time2 (time3-time2)
session1 click3 time3 0
session2 click1 time1 (time2-time1)
session2 click2 time2 (time3-time2)
session2 click3 time3 0
Which component of hadoop can I use to get the above functionality?
One possible solution is to use Map Reduce.
Map could emit Key,Value as SessionID, Click-Time
pair. On the reducer end, sort the Click-Time
pair by order of time. So you could easily get the first, second and third time of clicks. Rest is simple, just emit Key, Click, Time, Time Difference
(each separated by delimiter tab) for each pair from the reducer. The value of the reducer could be NullWritable
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.