简体   繁体   中英

Map-reduce Concept

What type of the inputs and outputs do the map and reduce functions in MapReduce use? How are the inputs and outputs of the two functions connected?

The input of map function in MapReduce is a document

The output of map function in MapReduce is a sequence of tuple(word,1)

The input of reduce function in MapReduce is a key and a list of all values of that key

The output of reduce function in MapReduce is a sequence of tuples(word, number of occurrences)

Is it correct? what about the connected functions, is combiner?

The inputs and outputs are connected via serialization.

The default input is TextInputFormat which uses a LineRecordReader , however both of these properties can be overridden

Underneath, everything is only bytes, and the Writable objects in MapReduce (Text, IntWritable, etc) are just thin layers over a byte[]

Reducer input is the joined output of a mapper, by key, yes. Output are key value pairs, or a tuple. But both values can be complex objects, so you output more than just two fields. A Combiner is just a different type Reducer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM