简体   繁体   中英

Generating Multiple Output files with Hadoop 0.20+

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category in their respected files. I know with 0.18 that you can do this with MultipleOutputs and it has not been removed. However, I am trying to make my application 0.20+ compliant. The existing Multiple outputs functionality still requires JobConf (which my application uses Job, and Configuration). How can I generate multiple outputs based on the key?

Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.

You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The resulting jar writes each record to a file named after its (sanitized) key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM