[英]Different output while running mapreduce on local machine in IDEA and in hadoop on cluster
The problem is what it says in the description. 问题在于它在描述中所说的内容。 I have some code.
我有一些代码。
This is the reducer. 这是减速器。
public class RTopLoc extends Reducer<CompositeKey, IntWritable, Text, Text> {
private static int number = 0;
private static CompositeKey lastCK = new CompositeKey();
private static Text lastLac = new Text();
@Override
public void reduce(CompositeKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = sumValues(values);
String str = Integer.toString(sum);
String str2 = Integer.toString(number);
String str3 = key.getSecond().toString();
context.write(key.getFirst(), new Text(str3 + " " + str2 + " " + str));
if(number == 0){
number = sum;
lastCK = key;
context.write(new Text("1"), new Text("1"));
}
else if(lastCK.getFirst().equals(key.getFirst()) && sum > number){
lastCK = key;
context.write(new Text("2"), new Text("2"));
}
else if(!lastCK.getFirst().equals(key.getFirst())){
// context.write(lastCK.getFirst(), lastCK.getSecond());
context.write(new Text("3"), new Text("3"));
number = sum;
lastCK = key;
}
}
it runs fine until the reducer. 它运行良好,直到减速机。 Then, when I run it in intelij idea (In windows), I get
然后,当我在intelij idea(在Windows中)运行它时,我得到了
0000000000 44137 0 2
1 1
902996760100000 44137 2 6
3 3
9029967602 44137 6 8
3 3
90299676030000 44137 8 1
3 3
9029967604 44137 1 5
3 3
905000 38704 5 1
3 3
9050000001 38702 1 24
3 3
9050000001 38704 24 14
9050000001 38705 24 12
9050000001 38706 24 13
9050000001 38714 24 24
9050000002 38704 24 12
3 3
9050000002 38706 12 12
9050000011 38704 12 6
3 3
9050000011 38706 6 12
2 2
9050000021 38702 6 12
3 3
9050000031 38704 12 6
3 3
9050000031 38705 6 6
9050000031 38714 6 12
2 2
After I package the code (I use maven) and run it on hadoop (Linux), I get 我打包代码(我使用maven)并在hadoop(Linux)上运行后,我得到了
0000000000 44137 0 2
1 1
902996760100000 44137 2 6
2 2
9029967602 44137 2 8
2 2
90299676030000 44137 2 1
9029967604 44137 2 5
2 2
905000 38704 2 1
9050000001 38702 2 24
2 2
9050000001 38704 2 14
2 2
9050000001 38705 2 12
2 2
9050000001 38706 2 13
2 2
9050000001 38714 2 24
2 2
9050000002 38704 2 12
2 2
9050000002 38706 2 12
2 2
9050000011 38704 2 6
2 2
9050000011 38706 2 12
2 2
9050000021 38702 2 12
2 2
9050000031 38704 2 6
2 2
9050000031 38705 2 6
2 2
9050000031 38714 2 12
2 2
I use this to run the code. 我用它来运行代码。
hadoop jar Project.jar inputPath outputPath
It looks like the difference is caused by a problem comparing parts of your stored key (lastCK) and the current key. 看起来差异是由比较存储密钥(lastCK)和当前密钥的部分问题引起的。
I would change this line: 我会改变这一行:
lastCK = key;
Keys and values are reused in Hadoop so when this is running on a real cluster, your keys will just be the same since lastCK
and key
will both be the same object. 密钥和值在Hadoop中重用,因此当它在真实集群上运行时,您的密钥将是相同的,因为
lastCK
和key
都将是同一个对象。
You need to either properly copy key
into lastCK
, perhaps using a .set()
method (that you write and is a common pattern in hadoop) or create a new one using a constructor that accepts an CompositeKey
. 您需要正确地将
key
复制到lastCK
,可能使用.set()
方法(您编写并且是hadoop中的常见模式)或使用接受CompositeKey
的构造函数创建新模式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.