简体   繁体   English

MapReduce 职位:我怎么收<text, intwritable>在 Map 阶段和 output<text, text> 在减少阶段?</text,></text,>

[英]MapReduce Job: How do I take in <Text, IntWritable> during Map phase and output <Text, Text> in Reduce phase?

I am trying to make my output look like the following: Model output我想让我的 output 看起来像下面这样: Model output

But I am stuck with this: My output但我坚持这个:我的 output

How do I convert the value (IntWritable) from the output to Text and concatenate the string " words" into the output?如何将值 (IntWritable) 从 output 转换为文本并将字符串“words”连接到 output? I also need to format the numbers from the output to start at the same spot as shown in the model answer.我还需要将 output 中的数字格式化为从 model 答案中显示的相同位置开始。 The input is <Text, IntWritable> and I am guessing the output has to be <Text, Text> .输入是<Text, IntWritable>我猜 output 必须是<Text, Text>

My codes for mapper:我的映射器代码:

public static class TokenizerMapper
   extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private final static IntWritable zero = new IntWritable(0);


private Text word1 = new Text("1.X short:");
private Text word2 = new Text("2.short:");
private Text word3 = new Text("3.medium:");
private Text word4 = new Text("4.long:");
private Text word5 = new Text("5.X long:");
private Text word6 = new Text("6.XX long:");

public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
      
      String word = itr.nextToken();
      int length = word.length();
      
      if ((length >= 1) && (length <= 3)){
          context.write(word1, one);
      }
      else
          context.write(word1, zero);

      if ((length >= 4) && (length <= 5)){
          context.write(word2, one);
      }
      else
          context.write(word2, zero);

      if ((length >= 6) && (length <= 8)){
          context.write(word3, one);
      }
      else
          context.write(word3, zero);

      if ((length >= 9) && (length <= 12)){
          context.write(word4, one);
      }
      else
          context.write(word4, zero);

      if ((length >= 13) && (length <= 15)){
          context.write(word5, one);
      }
      else
          context.write(word5, zero);

      if (length >= 16){
          context.write(word6, one);
      }
      else
          context.write(word6, zero);
     
      
  }
}

My codes for Reducer:我的 Reducer 代码:

 public static class IntSumReducer
   extends Reducer<Text,IntWritable,Text, IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
                   Context context
                   ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val : values) {
sum += val.get();
  }
  result.set(sum);
key.set(key.toString().substring(1));
  context.write(key, result);
}

So, first, you don't need to write zeros at all in the mapper.因此,首先,您根本不需要在映射器中写入零。 Just focus on the ones if you are summing data.如果您要汇总数据,只需关注那些。

Then, it's a simple change - Change your output type然后,这是一个简单的更改 - 更改您的 output 类型

// in the driver
job.setOutputValueClass(Text.class);

And

extends Reducer<Text,IntWritable, Text, Text> 

And just return the correct information并且只返回正确的信息

context.write(key, new Text(String.format("%d words", result)))

format the numbers from the output to start at the same spot将 output 中的数字格式化为从同一位置开始

Is that really necessary?那真的有必要吗? You can do this with string padding in the String.format method, but I wouldn't really worry about it.您可以在String.format方法中使用字符串填充来执行此操作,但我不会真的为此担心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM