Hadoop - WordCount的结果不是写在输出文件上

Question

I'm trying to run a program to count the number of words with their frequency by following the steps given in this link: http://developer.yahoo.com/hadoop/tutorial/module3.html 我正在尝试运行一个程序，按照此链接中给出的步骤计算其频率的字数： http ： //developer.yahoo.com/hadoop/tutorial/module3.html

I've loaded one directory named input which includes three text files. 我已经加载了一个名为input的目录，其中包含三个文本文件。

I was able to configure everything correctly. 我能够正确配置一切。 Now while the running the WordCount.java, I don't see anything in part-00000 file inside output directory. 现在在运行WordCount.java时，我在输出目录中的part-00000文件中看不到任何内容。

The java code for Mapper is: Mapper的java代码是：

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordCountMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(WritableComparable key, Writable value,
  OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString();
StringTokenizer itr = new StringTokenizer(line.toLowerCase());
while(itr.hasMoreTokens()) {
  word.set(itr.nextToken());
  output.collect(word, one);
}
}

@Override
public void map(LongWritable arg0, Text arg1,
    OutputCollector<Text, IntWritable> arg2, Reporter arg3)
     throws IOException {
// TODO Auto-generated method stub

 }

}

The reduce code is: 减少代码是：

public class WordCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values,
  OutputCollector output, Reporter reporter) throws IOException {

int sum = 0;
while (values.hasNext()) {
    //System.out.println(values.next());
  IntWritable value = (IntWritable) values.next();
  sum += value.get(); // process value
}

output.collect(key, new IntWritable(sum));
 }
 }

The code for Word counter is: Word计数器的代码是：

public class Counter {

public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(com.example.Counter.class);

    // TODO: specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    // TODO: specify input and output DIRECTORIES (not files)
    conf.setInputPath(new Path("src"));
    conf.setOutputPath(new Path("out"));

    // TODO: specify a mapper
    conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);

    // TODO: specify a reducer
    conf
                   .setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

}

In console I get these logs: 在控制台中我得到这些日志：

13/09/10 10:09:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the       arguments. Applications should implement Tool for the same.
13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3
13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3
13/09/10 10:09:20 INFO mapred.JobClient: Running job: job_201309100855_0012
13/09/10 10:09:21 INFO mapred.JobClient:  map 0% reduce 0%
13/09/10 10:09:25 INFO mapred.JobClient:  map 25% reduce 0%
13/09/10 10:09:26 INFO mapred.JobClient:  map 75% reduce 0%
13/09/10 10:09:27 INFO mapred.JobClient:  map 100% reduce 0%
13/09/10 10:09:35 INFO mapred.JobClient: Job complete: job_201309100855_0012
13/09/10 10:09:35 INFO mapred.JobClient: Counters: 15
13/09/10 10:09:35 INFO mapred.JobClient:   File Systems
13/09/10 10:09:35 INFO mapred.JobClient:     HDFS bytes read=54049
13/09/10 10:09:35 INFO mapred.JobClient:     Local bytes read=14
13/09/10 10:09:35 INFO mapred.JobClient:     Local bytes written=214
13/09/10 10:09:35 INFO mapred.JobClient:   Job Counters 
13/09/10 10:09:35 INFO mapred.JobClient:     Launched reduce tasks=1
13/09/10 10:09:35 INFO mapred.JobClient:     Launched map tasks=4
13/09/10 10:09:35 INFO mapred.JobClient:     Data-local map tasks=4
13/09/10 10:09:35 INFO mapred.JobClient:   Map-Reduce Framework
13/09/10 10:09:35 INFO mapred.JobClient:     Reduce input groups=0
13/09/10 10:09:35 INFO mapred.JobClient:     Combine output records=0
13/09/10 10:09:35 INFO mapred.JobClient:     Map input records=326
13/09/10 10:09:35 INFO mapred.JobClient:     Reduce output records=0
13/09/10 10:09:35 INFO mapred.JobClient:     Map output bytes=0
13/09/10 10:09:35 INFO mapred.JobClient:     Map input bytes=50752
13/09/10 10:09:35 INFO mapred.JobClient:     Combine input records=0
13/09/10 10:09:35 INFO mapred.JobClient:     Map output records=0
13/09/10 10:09:35 INFO mapred.JobClient:     Reduce input records=0

I'm pretty new in Hadoop. 我在Hadoop中很新。

Kindly reply with appropriate answer. 请回复适当的答案。

Thanks. 谢谢。

Answer 1

You have two map methods in your Mapper class. Mapper类中有两个map方法。 The one with the @Override annotation is the method which is actually getting overridden, and that method does not do anything. 具有@Override注释的那个是实际被覆盖的方法，并且该方法不执行任何操作。 So nothing comes out of your mapper, and nothing goes into the reducer, and consequently there is no output. 所以没有任何东西从你的映射器中出来，也没有任何东西进入reducer，因此没有输出。

Delete the map method marked with the @Override annotation and mark the first map method with @Override . 删除map标有方法@Override注释和标记所述第一map与方法@Override 。 Then fix any method signature issues, and it should work. 然后修复任何方法签名问题，它应该工作。

Answer 2

I faced the same problem. 我遇到了同样的问题。 I resolved it by deleting the overrided map method and changing signature of map method with first argument to be of LongWritable . 我通过删除覆盖的map方法并将map方法的签名更改为第一个参数为LongWritable来解决它。 Update the map method signature as below: 更新地图方法签名如下：

@Override
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) 
    throws IOException {

Hadoop - WordCount的结果不是写在输出文件上

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-09-10 05:37:48

解决方案2
0 2013-12-24 21:21:17

Hadoop - WordCount的结果不是写在输出文件上

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-09-10 05:37:48

解决方案2 0 2013-12-24 21:21:17

解决方案1
4 已采纳 2013-09-10 05:37:48

解决方案2
0 2013-12-24 21:21:17