Multiple for-each loops in Hadoop reducer

Question

I faced the problem of multiple for-each loops in Hadoop, is it even possible?

What code I have now for reducer class:

public class R_PreprocessAllSMS extends Reducer<Text, Text, Text, Text>{
private final static Text KEY = new Text();
private final static Text VALUE = new Text();

    @Override
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (Text value : values) {
            String[] splitString = value.toString().split("\t");
            sum += Integer.parseInt(splitString[1]);
        }
        if (sum > 100) {
            for (Text value : values) {
                String[] splitString = value.toString().split("\t");
                System.out.println(key.toString() + splitString[0] + " " + splitString[1]);
                KEY.set(key);
                VALUE.set(splitString[0] + "\t" + splitString[1]);
                context.write(KEY, VALUE);
            }
        }
    }
}

But I want to have a possibility to search through given values for the second time and to emit those which we need. If it's not possible, what is the recommended way of doing that in Hadoop you'd advice? Thanks.

Answer 1

Maybe use two pair of Mappres and Reducers? You can call them one after another. For example create two jobs in one main. Second get results of first.

JobConf jobConf1 = new JobConf();  
JobConf jobConf2 = new JobConf();  

Job job1 = new Job(jobConf1);  

Job job2 = new Job(jobConf2);

Or tou can look on that: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html

Answer 2

Instead of looping twice you can delay writing the values until you know that the sum is high enough, something like:

    int sum = 0;
    List list = new ArrayList<String>();
    KEY.set(key);

    for (Text value : values) {
        String[] splitString = value.toString().split("\t");
        String line = splitString[0] + "\t" + splitString[1];

        sum += Integer.parseInt(splitString[1]);

        if (sum < 100) {
            list.add(line);
        } else {
            if (!list.isEmpty()) {
                for (String val: list) {
                   VALUE.set(val);
                   context.write(KEY, VALUE);
                }
                list.clear();
            }
            VALUE.set(line);
            context.write(KEY, VALUE);
        }
    }

Multiple for-each loops in Hadoop reducer

Question

2 answers

solution1
0 2014-01-23 13:23:39

solution2
0 ACCPTED 2014-01-23 15:25:27

Multiple for-each loops in Hadoop reducer

Question

2 answers

solution1 0 2014-01-23 13:23:39

solution2 0 ACCPTED 2014-01-23 15:25:27

solution1
0 2014-01-23 13:23:39

solution2
0 ACCPTED 2014-01-23 15:25:27