[英]MapReduce Job: How do I take in <Text, IntWritable> during Map phase and output <Text, Text> in Reduce phase?
I am trying to make my output look like the following: Model output我想让我的 output 看起来像下面这样: Model output
But I am stuck with this: My output但我坚持这个:我的 output
How do I convert the value (IntWritable) from the output to Text and concatenate the string " words" into the output?如何将值 (IntWritable) 从 output 转换为文本并将字符串“words”连接到 output? I also need to format the numbers from the output to start at the same spot as shown in the model answer.
我还需要将 output 中的数字格式化为从 model 答案中显示的相同位置开始。 The input is
<Text, IntWritable>
and I am guessing the output has to be <Text, Text>
.输入是
<Text, IntWritable>
我猜 output 必须是<Text, Text>
。
My codes for mapper:我的映射器代码:
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private final static IntWritable zero = new IntWritable(0);
private Text word1 = new Text("1.X short:");
private Text word2 = new Text("2.short:");
private Text word3 = new Text("3.medium:");
private Text word4 = new Text("4.long:");
private Text word5 = new Text("5.X long:");
private Text word6 = new Text("6.XX long:");
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String word = itr.nextToken();
int length = word.length();
if ((length >= 1) && (length <= 3)){
context.write(word1, one);
}
else
context.write(word1, zero);
if ((length >= 4) && (length <= 5)){
context.write(word2, one);
}
else
context.write(word2, zero);
if ((length >= 6) && (length <= 8)){
context.write(word3, one);
}
else
context.write(word3, zero);
if ((length >= 9) && (length <= 12)){
context.write(word4, one);
}
else
context.write(word4, zero);
if ((length >= 13) && (length <= 15)){
context.write(word5, one);
}
else
context.write(word5, zero);
if (length >= 16){
context.write(word6, one);
}
else
context.write(word6, zero);
}
}
My codes for Reducer:我的 Reducer 代码:
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
key.set(key.toString().substring(1));
context.write(key, result);
}
So, first, you don't need to write zeros at all in the mapper.因此,首先,您根本不需要在映射器中写入零。 Just focus on the ones if you are summing data.
如果您要汇总数据,只需关注那些。
Then, it's a simple change - Change your output type然后,这是一个简单的更改 - 更改您的 output 类型
// in the driver
job.setOutputValueClass(Text.class);
And和
extends Reducer<Text,IntWritable, Text, Text>
And just return the correct information并且只返回正确的信息
context.write(key, new Text(String.format("%d words", result)))
format the numbers from the output to start at the same spot
将 output 中的数字格式化为从同一位置开始
Is that really necessary?那真的有必要吗? You can do this with string padding in the
String.format
method, but I wouldn't really worry about it.您可以在
String.format
方法中使用字符串填充来执行此操作,但我不会真的为此担心。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.