简体   繁体   中英

Hadoop MapReduce: Replicating the data from mapper to reducer

I am getting the required output from the mapper but it is not sorted. Is there any way to get the Mapper output sorted or any way to replicate the exact data coming from mapper to reducer (Hope data will be sorted during reduce phase)? Since i'm a newbie to Hadoop, please help if you have any sample code.

Output from mapper:

1,abcd,76 5,yyht,87 3,ddfg,43

I want this result to be in a sorted way.

There is a phase in MapReduce called Shuffling. This happens right after Map phase and before sending data to the reducer. Shuffling has mainly two phases, one is sorting and other one is grouping. You don't need to sort output of the mapper explicitly.

Here is quick example.

(Hello, 1) (Hello, 1) (A, 1) (boss, 1) > These will be first sorted

(A,1), (boss,1), (Hello,1), (Hello,1) > Sorting done on the KEY, and now groupping

(A,<1>), (boss, <1>), (Hello,<1,1>) > (Key,List<Values>)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM