简体   繁体   English

Map-Reduce程序:Mapper行为异常

[英]Map-Reduce Program : Mapper not behaving as expected

Friends, 朋友们

I am new to Map-Reduce and trying my hand with one example which only executes a Mapper; 我是Map-Reduce的新手,并举了一个仅执行Mapper的示例来尝试我的手。 but the output is strange and not expected. 但是输出是奇怪的并且不是预期的。 Please help me finding, if I am missing something here: 如果我在这里缺少什么,请帮助我找到:

Code part: 代码部分:

Imports: 进口:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

Driver Program 驱动程序

Job job = new Job(conf,"SampleProgram");
job.setJarByClass(SampleMR.class);     // class that contains mapper and reducer
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);    // reducer class

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path("/tmp/"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/out"));  // adjust directories as required

job.submit();

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

Mapper Program 映射程序

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>  {
@Override
        public void map(LongWritable idx , Text value, Context context) throws IOException, InterruptedException {
            String[] tokens = value.toString().split("|");
            String keyPrefix = tokens[0] + tokens[1];
            context.write(new Text(keyPrefix), value);
        }
    }

There is a reducer phase as well, but I have set reducer to 0 to debug the issue. 也有一个reducer阶段,但是我已经将reducer设置为0来调试问题。 Here the mapper is not behaving correctly. 在这里,映射器的行为不正确。

For the Input 对于输入

379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2 379782759851005 | ABCDEFG |名称:YOLO |顶部:44.7 |平均顶部:19.2

The expected Map output is 预期的Map输出为

379782759851005ABCDEFG [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2 379782759851005ABCDEFG [空白] 379782759851005 | ABCDEFG |名称:YOLO |顶部:44.7 |平均顶部:19.2

Output my Mapper 输出我的映射器

3 [Blank Space] 379782759851005|ABCDEFG|name:YOLO|top:44.7|avgtop:19.2 3 [空白空间] 379782759851005 | ABCDEFG |名称:YOLO |顶部:44.7 |平均顶部:19.2

Looks like, the Key is printing just first letter of the expected output. 看起来,关键字只是打印预期输出的第一个字母。 Same is happening with value as well, if I try to add tokens[4] as value to the context. 如果我尝试将tokens[4]作为值添加到上下文中,则值也会发生同样的情况。 Looks like there is something happening while spliting the string. 似乎在拆分字符串时发生了一些事情。 Any Insight, what could be going wrong? 任何见解,可能出了什么问题?

you need to escape the pipe character. 您需要转义管道字符。 see the link below: 请参阅下面的链接:

Splitting string with pipe character ("|") 用竖线字符(“ |”)分割字符串

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM