Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

Question

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.

Answer 1

There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.

Tsv(args("input"), ('COUNTRY, 'GDP))
.read
.write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
// it will create a directory for each country under "output" path in Hadoop mode.

Answer 2

Use MultipleOutputFormat and extrapolate from these other SO questions to write a custom output class using the output format: Create Scalding Source like TextLine that combines multiple files into single mappers , Compress Output Scalding / Cascading TsvCompressed

Answer 3

This suggestion on the Cascading User group suggests to use Cascading TemplateTap . Not sure how to connect this to Scalding though.

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

Question

3 answers

solution1
6 ACCPTED 2014-06-25 12:04:42

solution2
0 2014-06-02 12:47:36

solution3
0 2014-06-02 18:27:29

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

Question

3 answers

solution1 6 ACCPTED 2014-06-25 12:04:42

solution2 0 2014-06-02 12:47:36

solution3 0 2014-06-02 18:27:29

solution1
6 ACCPTED 2014-06-25 12:04:42

solution2
0 2014-06-02 12:47:36

solution3
0 2014-06-02 18:27:29