简体繁体中英

Generating Multiple Output files with Hadoop 0.20+

原文 2010-02-01 21:08:29 0 2 java/ file-io/ hadoop

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category in their respected files. I know with 0.18 that you can do this with MultipleOutputs and it has not been removed. However, I am trying to make my application 0.20+ compliant. The existing Multiple outputs functionality still requires JobConf (which my application uses Job, and Configuration). How can I generate multiple outputs based on the key?

2 answers

Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.

You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The resulting jar writes each record to a file named after its (sanitized) key.

Avoiding file collisions in Hadoop Pig script that writes multiple output files

HADOOP - number of output files produced as mapper output

How can I use MultipleoutputFormai in Hadoop 0.20?

How to set the number of map tasks in hadoop 0.20?

Hadoop Mapreduce multiple Input files

Hadoop jar command error for multiple mapper inputs and 1 reducer output (Join 2 values from 2 files)

Java: read hadoop reducer's output files

Control number of hadoop mapper output files

Hadoop, MapReduce - Multiple Input/Output Paths

Multiple output path (Java - Hadoop - MapReduce)

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Avoiding file collisions in Hadoop Pig script that writes multiple output files HADOOP - number of output files produced as mapper output How can I use MultipleoutputFormai in Hadoop 0.20? How to set the number of map tasks in hadoop 0.20? Hadoop Mapreduce multiple Input files Hadoop jar command error for multiple mapper inputs and 1 reducer output (Join 2 values from 2 files) Java: read hadoop reducer's output files Control number of hadoop mapper output files Hadoop, MapReduce - Multiple Input/Output Paths Multiple output path (Java - Hadoop - MapReduce)

Related Tags

Generating Multiple Output files with Hadoop 0.20+

Question

2 answers

solution1
9 ACCPTED 2010-02-01 23:41:55

solution2
2 2010-02-03 01:06:27

Generating Multiple Output files with Hadoop 0.20+

Question

2 answers

solution1 9 ACCPTED 2010-02-01 23:41:55

solution2 2 2010-02-03 01:06:27

solution1
9 ACCPTED 2010-02-01 23:41:55

solution2
2 2010-02-03 01:06:27