如何格式化Hadoop中Mapreduce编写的输出

Question

I am trying to reverse the contents of the file by each word. 我正在尝试按每个字反转文件的内容。 I have the program running fine, but the output i am getting is something like this 我的程序运行正常，但是我得到的输出是这样的

1   dwp
2   seviG
3   eht
4   tnerruc
5   gnikdrow
6   yrotcerid
7   ridkm
8   desU
9   ot
10  etaerc

I want the output to be something like this 我希望输出是这样的

dwp seviG eht tnerruc gnikdrow yrotcerid ridkm desU
ot etaerc

The code i am working with 我正在使用的代码

    import java.io.IOException;
    import java.util.*;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class Reproduce {

    public static int temp =0;
    public static class ReproduceMap extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text>{
        private Text word = new Text();
        @Override
        public void map(LongWritable arg0, Text value,
                OutputCollector<IntWritable, Text> output, Reporter reporter)
                throws IOException {
            String line = value.toString().concat("\n");
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(new StringBuffer(tokenizer.nextToken()).reverse().toString());
                temp++;
                output.collect(new IntWritable(temp),word);
              }

        }

    }

    public static class ReproduceReduce extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>{

        @Override
        public void reduce(IntWritable arg0, Iterator<Text> arg1,
                OutputCollector<IntWritable, Text> arg2, Reporter arg3)
                throws IOException {
            String word = arg1.next().toString();
            Text word1 = new Text();
            word1.set(word);
            arg2.collect(arg0, word1);

        }

    }

    public static void main(String[] args) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    conf.setOutputKeyClass(IntWritable.class);
    conf.setOutputValueClass(Text.class);

    conf.setMapperClass(ReproduceMap.class);
    conf.setReducerClass(ReproduceReduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);

  }
}

How do i modify my output instead of writing another java program to do that 我如何修改我的输出而不是编写另一个Java程序来做到这一点

Thanks in advance 提前致谢

Answer 1

Here is a simple code demonstrate the use of custom FileoutputFormat 这是一个简单的代码，演示自定义FileoutputFormat的使用

public class MyTextOutputFormat extends FileOutputFormat<Text, List<IntWritable>> {
      @Override
      public org.apache.hadoop.mapreduce.RecordWriter<Text, List<Intwritable>> getRecordWriter(TaskAttemptContext arg0) throws IOException, InterruptedException {
         //get the current path
         Path path = FileOutputFormat.getOutputPath(arg0);
         //create the full path with the output directory plus our filename
         Path fullPath = new Path(path, "result.txt");
     //create the file in the file system
     FileSystem fs = path.getFileSystem(arg0.getConfiguration());
     FSDataOutputStream fileOut = fs.create(fullPath, arg0);

     //create our record writer with the new file
     return new MyCustomRecordWriter(fileOut);
  }
}

public class MyCustomRecordWriter extends RecordWriter<Text, List<IntWritable>> {
    private DataOutputStream out;

    public MyCustomRecordWriter(DataOutputStream stream) {
        out = stream;
        try {
            out.writeBytes("results:\r\n");
        }
        catch (Exception ex) {
        }  
    }

    @Override
    public void close(TaskAttemptContext arg0) throws IOException, InterruptedException {
        //close our file
        out.close();
    }

    @Override
    public void write(Text arg0, List arg1) throws IOException, InterruptedException {
        //write out our key
        out.writeBytes(arg0.toString() + ": ");
        //loop through all values associated with our key and write them with commas between
        for (int i=0; i<arg1.size(); i++) {
            if (i>0)
                out.writeBytes(",");
            out.writeBytes(String.valueOf(arg1.get(i)));
        }
        out.writeBytes("\r\n");  
    }
}

Finally we need to tell our job about our ouput format and the path before running it. 最后，在运行输出格式之前，我们需要告知我们的工作。

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ArrayList.class);
job.setOutputFormatClass(MyTextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/home/hadoop/out"));

Answer 2

我们可以通过编写自定义文件outputformat类来自定义输出

Answer 3

you can use NullWritable as a output value. 您可以将NullWritable用作输出值。 NullWritable is just a placeholder Since you don't want number to be displayed as a part of your output. NullWritable只是一个占位符，因为您不希望数字显示在输出中。 I have given modified reducer class. 我给了修改的减速器类。 Note :- need to add import statement for NullWritable 注意：-需要为NullWritable添加import语句

public static class ReproduceReduce extends MapReduceBase implements Reducer<IntWritable, Text,  Text, NullWritable>{

            @Override
            public void reduce(IntWritable arg0, Iterator<Text> arg1,
                    OutputCollector<Text, NullWritable> arg2, Reporter arg3)
                    throws IOException {
                String word = arg1.next().toString();
                Text word1 = new Text();
                word1.set(word);
                arg2.collect(word1, new NullWritable());

            }

        }

and change the driver class or main method 并更改驱动程序类或主要方法

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(NullWritable.class);

Answer 4

In Mapper key temp is incremented for each word value, So each word is processed as a separate key-value pair. 在Mapper中，对于每个单词值，键温度都会增加，因此，每个单词都将作为单独的键值对进行处理。

Below steps should solve the problem 1) In Mapper just remove the temp++, so that all the reversed words will have the key as 0 (temp =0). 下面的步骤应该可以解决问题：1）在Mapper中，只需删除temp ++，以便所有反向单词的键都为0（temp = 0）。

2) Reducer receives the key 0 and list of reversed strings. 2）Reducer接收键0和反向字符串列表。 In reducer set the key to NullWritable and write the output. 在reducer中，将键设置为NullWritable并写入输出。

Answer 5

What you can try is take one constant key (or simply nullwritable) and pass this as a key and your complete line as a value(you can reverse it in mapper class or you can also reverse it in the reducer class as well). 您可以尝试使用一个常数键（或简单地为null可写）并将其作为键，并将完整的行作为值传递（您可以在mapper类中将其反转，也可以在reducer类中将其反转）。 so your reducer will receive a constant key (or place holder if you have used nullwritable as a key) and complete line. 因此，Reducer将收到一个常数键（如果您已使用nullwritable作为键，则将获得一个占位符）和完整行。 Now you can simply reverse the line and write it to output file. 现在，您可以简单地反转行并将其写入输出文件。 By not using tmp as a key you avoid writing unwanted numbers in your output file. 通过不使用tmp作为键，可以避免在输出文件中写入不需要的数字。

如何格式化Hadoop中Mapreduce编写的输出

问题描述

5 个解决方案

解决方案1
4 已采纳 2015-07-10 12:21:34

解决方案2
1 2014-12-28 12:21:03

解决方案3
0 2014-11-05 06:37:23

解决方案4
0 2014-11-05 07:23:23

解决方案5
0 2014-12-28 14:52:33

如何格式化Hadoop中Mapreduce编写的输出

问题描述

5 个解决方案

解决方案1 4 已采纳 2015-07-10 12:21:34

解决方案2 1 2014-12-28 12:21:03

解决方案3 0 2014-11-05 06:37:23

解决方案4 0 2014-11-05 07:23:23

解决方案5 0 2014-12-28 14:52:33

解决方案1
4 已采纳 2015-07-10 12:21:34

解决方案2
1 2014-12-28 12:21:03

解决方案3
0 2014-11-05 06:37:23

解决方案4
0 2014-11-05 07:23:23

解决方案5
0 2014-12-28 14:52:33