在IDEA中的本地计算机上运行mapreduce时，在集群上的hadoop中输出不同的输出

Question

The problem is what it says in the description. 问题在于它在描述中所说的内容。 I have some code. 我有一些代码。

This is the reducer. 这是减速器。

public class RTopLoc extends Reducer<CompositeKey, IntWritable, Text, Text> {
    private static int number = 0;
    private static CompositeKey lastCK = new CompositeKey();
    private static Text lastLac = new Text();

    @Override
    public void reduce(CompositeKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = sumValues(values);
        String str = Integer.toString(sum);
        String str2 = Integer.toString(number);
        String str3 = key.getSecond().toString();
        context.write(key.getFirst(), new Text(str3 + " " + str2 + " " + str));
        if(number == 0){
            number = sum;
            lastCK = key;
            context.write(new Text("1"), new Text("1"));
        }
        else if(lastCK.getFirst().equals(key.getFirst()) && sum > number){
            lastCK = key;
            context.write(new Text("2"), new Text("2"));
        }
        else if(!lastCK.getFirst().equals(key.getFirst())){
//            context.write(lastCK.getFirst(), lastCK.getSecond());
            context.write(new Text("3"), new Text("3"));
            number = sum;
            lastCK = key;

        }
    }

it runs fine until the reducer. 它运行良好，直到减速机。 Then, when I run it in intelij idea (In windows), I get 然后，当我在intelij idea（在Windows中）运行它时，我得到了

0000000000 44137 0 2
1 1
902996760100000 44137 2 6
3 3
9029967602 44137 6 8
3 3
90299676030000 44137 8 1
3 3
9029967604 44137 1 5
3 3
905000 38704 5 1
3 3
9050000001 38702 1 24
3 3
9050000001 38704 24 14
9050000001 38705 24 12
9050000001 38706 24 13
9050000001 38714 24 24
9050000002 38704 24 12
3 3
9050000002 38706 12 12
9050000011 38704 12 6
3 3
9050000011 38706 6 12
2 2
9050000021 38702 6 12
3 3
9050000031 38704 12 6
3 3
9050000031 38705 6 6
9050000031 38714 6 12
2 2

After I package the code (I use maven) and run it on hadoop (Linux), I get 我打包代码（我使用maven）并在hadoop（Linux）上运行后，我得到了

0000000000  44137 0 2
1   1
902996760100000 44137 2 6
2   2
9029967602  44137 2 8
2   2
90299676030000  44137 2 1
9029967604  44137 2 5
2   2
905000  38704 2 1
9050000001  38702 2 24
2   2
9050000001  38704 2 14
2   2
9050000001  38705 2 12
2   2
9050000001  38706 2 13
2   2
9050000001  38714 2 24
2   2
9050000002  38704 2 12
2   2
9050000002  38706 2 12
2   2
9050000011  38704 2 6
2   2
9050000011  38706 2 12
2   2
9050000021  38702 2 12
2   2
9050000031  38704 2 6
2   2
9050000031  38705 2 6
2   2
9050000031  38714 2 12
2   2

I use this to run the code. 我用它来运行代码。

hadoop jar Project.jar inputPath outputPath

Answer 1

It looks like the difference is caused by a problem comparing parts of your stored key (lastCK) and the current key. 看起来差异是由比较存储密钥（lastCK）和当前密钥的部分问题引起的。

I would change this line: 我会改变这一行：

lastCK = key;

Keys and values are reused in Hadoop so when this is running on a real cluster, your keys will just be the same since lastCK and key will both be the same object. 密钥和值在Hadoop中重用，因此当它在真实集群上运行时，您的密钥将是相同的，因为lastCK和key都将是同一个对象。

You need to either properly copy key into lastCK , perhaps using a .set() method (that you write and is a common pattern in hadoop) or create a new one using a constructor that accepts an CompositeKey . 您需要正确地将key复制到lastCK ，可能使用.set()方法（您编写并且是hadoop中的常见模式）或使用接受CompositeKey的构造函数创建新模式。

在IDEA中的本地计算机上运行mapreduce时，在集群上的hadoop中输出不同的输出

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-10-27 14:23:35

在IDEA中的本地计算机上运行mapreduce时，在集群上的hadoop中输出不同的输出

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-10-27 14:23:35

解决方案1
1 已采纳 2016-10-27 14:23:35