简体   繁体   English

Hadoop MapReduce总排序字数

[英]Hadoop MapReduce Total Sort Word Count

I want to do a total sort in mapreduce Word Count. 我想对mapreduce字数进行总计排序。

public int run(String[] args) throws Exception {
  Job job = Job.getInstance(getConf(), "wordcount");
  job.setJarByClass(this.getClass());
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(Map.class);

  //Total Sort
  job.setPartitionerClass(TotalOrderPartitioner.class);
  InputSampler.Sampler<Text, IntWritable> sampler = new InputSampler.RandomSampler<Text, IntWritable>(0.1, 10000, 10);
  InputSampler.writePartitionFile(job, sampler);
  Path inputDir = new Path(args[2] + "/_tmp");
  Path partitionFile = new Path(inputDir, "_partitioning");
  TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),partitionFile);
  InputSampler.writePartitionFile(job, sampler);

  job.setReducerClass(Reduce.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);  

  return job.waitForCompletion(true) ? 0 : 1;
}

But i got error like java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable 但是我遇到了类似java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable

I don't understand how InputSampler.RandomSampler work. 我不了解InputSampler.RandomSampler的工作方式。

In the above code there is no InputFormat set for the job, so default will be taken which is TextInputFormat<LongWritable,Text> . 在上面的代码中,没有为作业设置InputFormat ,因此将采用默认值TextInputFormat<LongWritable,Text>

For InputSampler.RandomSampler<Text, IntWritable> it has been configured as Text, IntWritable which doesn't match with TextInputFormat . 对于InputSampler.RandomSampler<Text, IntWritable>它已配置为Text,IntWritable ,它与TextInputFormat不匹配。

Since there is type mismatch between InputFormat and InputSampler the error is thrown. 由于InputFormatInputSampler之间的类型不匹配, InputSampler会引发错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM