[英]Hadoop is skipping reduce phase entirely
I have set up a Hadoop job like so: 我已经像这样建立了Hadoop工作:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Legion");
job.setJarByClass(Legion.class);
job.setMapperClass(CallQualityMap.class);
job.setReducerClass(CallQualityReduce.class);
// Explicitly configure map and reduce outputs, since they're different classes
job.setMapOutputKeyClass(CallSampleKey.class);
job.setMapOutputValueClass(CallSample.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(CombineRepublicInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
CombineRepublicInputFormat.setMaxInputSplitSize(job, 128000000);
CombineRepublicInputFormat.setInputDirRecursive(job, true);
CombineRepublicInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
This job completes, but something strange happens. 这项工作完成了,但是发生了一些奇怪的事情。 I get one output line per input line.
每条输入线只有一条输出线。 Each output line consists of the output from a
CallSampleKey.toString()
method, then a tab, then something like CallSample@17ab34d
. 每条输出行
CallSampleKey.toString()
方法的输出,选项卡和CallSample@17ab34d
类的CallSample@17ab34d
。
This means that the reduce phase is never running and the CallSampleKey
and CallSample
are getting passed directly to the TextOutputFormat
. 这意味着reduce阶段永远不会运行,并且
CallSampleKey
和CallSample
将直接传递到TextOutputFormat
。 But I don't understand why this would be the case. 但是我不明白为什么会这样。 I've very clearly specified
job.setReducerClass(CallQualityReduce.class);
我已经非常明确地指定了
job.setReducerClass(CallQualityReduce.class);
, so I have no idea why it would skip the reducer! ,所以我不知道为什么它会跳过减速器!
Edit: Here's the code for the reducer: 编辑:这是减速器的代码:
public static class CallQualityReduce extends Reducer<CallSampleKey, CallSample, NullWritable, Text> {
public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {
Call call = new Call(inKey.getId().toString(), inKey.getUuid().toString());
while (inValues.hasNext()) {
call.addSample(inValues.next());
}
context.write(NullWritable.get(), new Text(call.getStats()));
}
}
What if you try to change your 如果您尝试更改自己的帐户怎么办
public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {
to use Iterable
instead of Iterator
? 使用
Iterable
代替Iterator
?
public void reduce(CallSampleKey inKey, Iterable<CallSample> inValues, Context context) throws IOException, InterruptedException {
You'll have to then use inValues.iterator()
to get the actual iterator. 然后,您必须使用
inValues.iterator()
来获取实际的迭代器。
If the method signature doesn't match then it's just falling through to the default identity reducer implementation . 如果方法签名不匹配,那么它就属于默认的身份减少器实现 。 It's perhaps unfortunate that the underlying default implementation doesn't make it easy to detect this kind of typo, but the next best thing is to always use
@Override
in all methods you intend to override so that the compiler can help. 不幸的是,底层的默认实现不能使检测这种类型的错字变得容易,但是第二个最好的事情是在打算
@Override
的所有方法中始终使用@Override
,以便编译器可以提供帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.