Friends,
I am new to Map-Reduce and trying my hand with one example which only executes a Mapper; but the output is strange and not expected. Please help me finding, if I am missing something here:
Code part:
Imports:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
Driver Program
Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class); // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class); // reducer class
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out")); // adjust directories as required
job.submit();
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
Mapper Program
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split("|");
String keyPrefix = tokens[0] + tokens[1];
context.write(new Text(keyPrefix), value);
}
}
There is a reducer phase as well, but I have set reducer to 0 to debug the issue. Here the mapper is not behaving correctly.
For the Input
379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
The expected Map output is
379782759851005ABCDEFG [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
Output my Mapper
3 [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2
Looks like, the Key is printing just first letter of the expected output. Same is happening with value as well, if I try to add tokens[4]
as value to the context. Looks like there is something happening while spliting the string. Any Insight, what could be going wrong?
you need to escape the pipe character. see the link below:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.