Hadoop完全跳过了减少阶段

Question

I have set up a Hadoop job like so: 我已经像这样建立了Hadoop工作：

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "Legion");
    job.setJarByClass(Legion.class);

    job.setMapperClass(CallQualityMap.class);
    job.setReducerClass(CallQualityReduce.class);

    // Explicitly configure map and reduce outputs, since they're different classes
    job.setMapOutputKeyClass(CallSampleKey.class);
    job.setMapOutputValueClass(CallSample.class);
    job.setOutputKeyClass(NullWritable.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(CombineRepublicInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    CombineRepublicInputFormat.setMaxInputSplitSize(job, 128000000);
    CombineRepublicInputFormat.setInputDirRecursive(job, true);
    CombineRepublicInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
}

This job completes, but something strange happens. 这项工作完成了，但是发生了一些奇怪的事情。 I get one output line per input line. 每条输入线只有一条输出线。 Each output line consists of the output from a CallSampleKey.toString() method, then a tab, then something like CallSample@17ab34d . 每条输出行CallSampleKey.toString()方法的输出，选项卡和CallSample@17ab34d类的CallSample@17ab34d 。

This means that the reduce phase is never running and the CallSampleKey and CallSample are getting passed directly to the TextOutputFormat . 这意味着reduce阶段永远不会运行，并且CallSampleKey和CallSample将直接传递到TextOutputFormat 。 But I don't understand why this would be the case. 但是我不明白为什么会这样。 I've very clearly specified job.setReducerClass(CallQualityReduce.class); 我已经非常明确地指定了job.setReducerClass(CallQualityReduce.class); , so I have no idea why it would skip the reducer! ，所以我不知道为什么它会跳过减速器！

Edit: Here's the code for the reducer: 编辑：这是减速器的代码：

public static class CallQualityReduce extends Reducer<CallSampleKey, CallSample, NullWritable, Text> {

    public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {
        Call call = new Call(inKey.getId().toString(), inKey.getUuid().toString());

        while (inValues.hasNext()) {
            call.addSample(inValues.next());
        }

        context.write(NullWritable.get(), new Text(call.getStats()));
    }
}

Answer 1

What if you try to change your 如果您尝试更改自己的帐户怎么办

public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {

to use Iterable instead of Iterator ? 使用Iterable代替Iterator ？

public void reduce(CallSampleKey inKey, Iterable<CallSample> inValues, Context context) throws IOException, InterruptedException {

You'll have to then use inValues.iterator() to get the actual iterator. 然后，您必须使用inValues.iterator()来获取实际的迭代器。

If the method signature doesn't match then it's just falling through to the default identity reducer implementation . 如果方法签名不匹配，那么它就属于默认的身份减少器实现。 It's perhaps unfortunate that the underlying default implementation doesn't make it easy to detect this kind of typo, but the next best thing is to always use @Override in all methods you intend to override so that the compiler can help. 不幸的是，底层的默认实现不能使检测这种类型的错字变得容易，但是第二个最好的事情是在打算@Override的所有方法中始终使用@Override ，以便编译器可以提供帮助。

Hadoop完全跳过了减少阶段

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-12-04 01:19:43

Hadoop完全跳过了减少阶段

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-12-04 01:19:43

解决方案1
3 已采纳 2015-12-04 01:19:43