[英]Hadoop MapReduce Total Sort Word Count
我想對mapreduce字數進行總計排序。
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "wordcount");
job.setJarByClass(this.getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(Map.class);
//Total Sort
job.setPartitionerClass(TotalOrderPartitioner.class);
InputSampler.Sampler<Text, IntWritable> sampler = new InputSampler.RandomSampler<Text, IntWritable>(0.1, 10000, 10);
InputSampler.writePartitionFile(job, sampler);
Path inputDir = new Path(args[2] + "/_tmp");
Path partitionFile = new Path(inputDir, "_partitioning");
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),partitionFile);
InputSampler.writePartitionFile(job, sampler);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
但是我遇到了類似java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable
我不了解InputSampler.RandomSampler的工作方式。
在上面的代碼中,沒有為作業設置InputFormat
,因此將采用默認值TextInputFormat<LongWritable,Text>
。
對於InputSampler.RandomSampler<Text, IntWritable>
它已配置為Text,IntWritable ,它與TextInputFormat
不匹配。
由於InputFormat
和InputSampler
之間的類型不匹配, InputSampler
會引發錯誤。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.