简体   繁体   中英

How map reduce reads from the input file to get keys and values

I am trying to implement shortest path using map reduce and this is my input file Key value

Source Node            <Destination node,Weight>            
1                      <2,3>
1                      <3,1>
2                      <2,1>
2                      <3,4>

and so on .I know that at run time input file is picked from hdfs using something like this
$HADOOP_HOME/bin/hadoop --config $HADOOP_CONF_DIR jar Assignment3.jar InputMatrix.txt
in bash script submitted to cluster.But i dont understand how does the mapper get the
key and value,do i need to tokenize the input file to get the key and weights,I am
thinking of getting the least associated value of weight so my reduce gets something like this [1,<2,3>,1,<3,1>] so it loops over associated weights to get me least value which
in this case is 1 for key 1 .But i dont understand that how at runtime keys are made available to mapper and how is parsing done to get the keys (in the above input file keys
are separarted by tabs("\\t" ) from values and values themselved are "," separated

Depends on the input format but if you are using the standard inputformat (TextInputFormat), it processes line by line. The input for the map task is then: an offset in the file, and the line of your input file is the value. You could split the file on the tab character. An alternative would be to use a key-value input format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM