简体   繁体   English

为什么合并器输出记录= 0?

[英]Why combiner output records = 0?

I have multipleinput, so I have two mappers. 我有多个输入,所以我有两个映射器。 Also I have one combiner: 我还有一个合路器:

class JoinCombiner extends MapReduceBase implements
        Reducer<TextPair, Text, TextPair, Text> {

    @Override
    public void reduce(TextPair key, Iterator<Text> values,
            OutputCollector<TextPair, Text> output, Reporter reporter)
            throws IOException {

        Text nodeId = new Text(values.next());
        while (values.hasNext()) {
            Text node = values.next();
            TextPair outValue = new TextPair(nodeId.toString(), "0");
            output.collect(outValue , node);
        }
    }
}

When I use this class as Reducer - all words good. 当我使用此类作为Reducer时-所有单词都很好。 but if i use it as combiner - I have this info in log: 但是如果我将其用作组合器-我在日志中有此信息:

Combine input records=6
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=30
Reduce input records=0
Reduce output records=0

So, no output from combiner -> no input for reduce. 因此,没有来自组合器的输出->没有减少的输入。 I cant understand why. 我不明白为什么。 Please make some explanation if you have ideas)) Thanks 如果您有什么想法请解释一下))谢谢

A combiner gets executed only if you have a reducer. 仅当您具有减速器时,组合器才会执行。 Try setting both the combiner and reducer to the same class(if it's possible) and also consider setting the number of reduce tasks. 尝试将合并器和简化器设置为同一类(如果可能),并考虑设置简化任务的数量。

UPDATE: You're trying to change the key in combiner. 更新:您正在尝试更改组合器中的密钥。 The purpose of the combiner is to group the values of the same key together locally to reduce the traffic. 组合器的目的是在本地将同一键的值组合在一起以减少通信量。

From the Hadoop Tutorial on YDN YDN上Hadoop教程

Instances of the Combiner class are run on every node that has run map tasks. Combiner类的实例在具有运行映射任务的每个节点上运行。 The Combiner will receive as input all data emitted by the Mapper instances on a given node. 合并器将接收给定节点上Mapper实例发出的所有数据作为输入。 The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers. 然后,合并器的输出将发送到简化器,而不是映射器的输出。

Based on my experience, that is not totally true. 根据我的经验,那不是完全正确的。 Hadoop sends only the keys that are emitted by the mapper to the reducer - meaning if you've a combiner in between it should emit the same key as that of the mapper reducing the number of values associated with the key. Hadoop仅将映射器发出的密钥发送给reducer-这意味着,如果您之间有一个组合器,则它应发出与映射器相同的密钥,从而减少与该密钥关联的值的数量。 IMO, changing the keys in the combiner results in unexpected behavior. IMO,更改组合器中的键会导致意外行为。 To make you understand a simple usecase of combiners, consider a word counter. 为了使您理解组合器的简单用例,请考虑一个单词计数器。

Mapper1 emits: Mapper1发出:

hi 1
hello 1
hi 1
hi 1
hello 1

Mapper2 emits: Mapper2发出:

hello 1
hi 1

You have seven output records. 您有七个输出记录。 Now if you want to reduce the number of keys locally(meaning on the same machine where the mapper is runnning), then having a combiner will give you something like this: 现在,如果您想减少本地键的数量(意味着在运行映射器的同一台机器上),那么拥有一个组合器将为您提供以下信息:

Combiner1 emits: Combiner1发出:

hi 3
hello 2

Combiner2 emits: Combiner2发出:

hello 1
hi 1

Notice that combiner did not change the key . 请注意,组合器未更改密钥 Now, at the reducer, you will get the values like this: 现在,在reducer上,您将获得如下值:

Reducer1: key: hi, values: <3, 1> and you emit hi 4 Reducer1: key: hi, values: <3, 1>而你发出hi 4

Because you've only one reducer, the same reducer will be called again by giving it a different key this time. 因为您只有一个化简器,所以这次将通过给它另一个键来再次调用同一个化简器。

Reducer1: key: hello, values: <2, 1> and you emit hello 3 Reducer1: key: hello, values: <2, 1> ,你发出hello 3

The final output would be as follows 最终输出如下

hello 3
hi 4

The output is sorted on the basis of the keys emitted by the mapper. 根据映射器发出的键对输出进行排序。 You can chose to change the key emitted by the reducer but your output will not be sorted by the key emitted by the reducer(by default). 您可以选择更改化简器发出的键,但是您的输出将不会按化简器发出的键排序(默认情况下)。 Hope that helps. 希望能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM