Hadoop MapReduce總排序字數

Question

我想對mapreduce字數進行總計排序。

public int run(String[] args) throws Exception {
  Job job = Job.getInstance(getConf(), "wordcount");
  job.setJarByClass(this.getClass());
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(Map.class);

  //Total Sort
  job.setPartitionerClass(TotalOrderPartitioner.class);
  InputSampler.Sampler<Text, IntWritable> sampler = new InputSampler.RandomSampler<Text, IntWritable>(0.1, 10000, 10);
  InputSampler.writePartitionFile(job, sampler);
  Path inputDir = new Path(args[2] + "/_tmp");
  Path partitionFile = new Path(inputDir, "_partitioning");
  TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),partitionFile);
  InputSampler.writePartitionFile(job, sampler);

  job.setReducerClass(Reduce.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);  

  return job.waitForCompletion(true) ? 0 : 1;
}

但是我遇到了類似java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable

我不了解InputSampler.RandomSampler的工作方式。

Answer 1

在上面的代碼中，沒有為作業設置InputFormat ，因此將采用默認值TextInputFormat<LongWritable,Text> 。

對於InputSampler.RandomSampler<Text, IntWritable>它已配置為Text，IntWritable ，它與TextInputFormat不匹配。

由於InputFormat和InputSampler之間的類型不匹配， InputSampler會引發錯誤。

Hadoop MapReduce總排序字數

問題描述

1 個解決方案

解決方案1
0 已采納 2015-09-07 13:53:42

Hadoop MapReduce總排序字數

問題描述

1 個解決方案

解決方案1 0 已采納 2015-09-07 13:53:42

解決方案1
0 已采納 2015-09-07 13:53:42