简体   繁体   中英

Map-Reduce Program : Mapper not behaving as expected

Friends,

I am new to Map-Reduce and trying my hand with one example which only executes a Mapper; but the output is strange and not expected. Please help me finding, if I am missing something here:

Code part:

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

Driver Program

Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class);     // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);    // reducer class

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out"));  // adjust directories as required

job.submit();

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

Mapper Program

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>  {
@Override
        public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
            String[] tokens = value.toString().split("|");
            String keyPrefix = tokens[0] + tokens[1];
            context.write(new Text(keyPrefix), value);
        }
    }

There is a reducer phase as well, but I have set reducer to 0 to debug the issue. Here the mapper is not behaving correctly.

For the Input

379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2

The expected Map output is

379782759851005ABCDEFG [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2

Output my Mapper

3 [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2

Looks like, the Key is printing just first letter of the expected output. Same is happening with value as well, if I try to add tokens[4] as value to the context. Looks like there is something happening while spliting the string. Any Insight, what could be going wrong?

you need to escape the pipe character. see the link below:

Splitting string with pipe character ("|")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM