简体   繁体   中英

Multiple for-each loops in Hadoop reducer

I faced the problem of multiple for-each loops in Hadoop, is it even possible?

What code I have now for reducer class:

public class R_PreprocessAllSMS extends Reducer<Text, Text, Text, Text>{
private final static Text KEY = new Text();
private final static Text VALUE = new Text();

    @Override
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (Text value : values) {
            String[] splitString = value.toString().split("\t");
            sum += Integer.parseInt(splitString[1]);
        }
        if (sum > 100) {
            for (Text value : values) {
                String[] splitString = value.toString().split("\t");
                System.out.println(key.toString() + splitString[0] + " " + splitString[1]);
                KEY.set(key);
                VALUE.set(splitString[0] + "\t" + splitString[1]);
                context.write(KEY, VALUE);
            }
        }
    }
}

But I want to have a possibility to search through given values for the second time and to emit those which we need. If it's not possible, what is the recommended way of doing that in Hadoop you'd advice? Thanks.

Maybe use two pair of Mappres and Reducers? You can call them one after another. For example create two jobs in one main. Second get results of first.

JobConf jobConf1 = new JobConf();  
JobConf jobConf2 = new JobConf();  

Job job1 = new Job(jobConf1);  

Job job2 = new Job(jobConf2);

Or tou can look on that: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/ChainReducer.html

Instead of looping twice you can delay writing the values until you know that the sum is high enough, something like:

    int sum = 0;
    List list = new ArrayList<String>();
    KEY.set(key);

    for (Text value : values) {
        String[] splitString = value.toString().split("\t");
        String line = splitString[0] + "\t" + splitString[1];

        sum += Integer.parseInt(splitString[1]);

        if (sum < 100) {
            list.add(line);
        } else {
            if (!list.isEmpty()) {
                for (String val: list) {
                   VALUE.set(val);
                   context.write(KEY, VALUE);
                }
                list.clear();
            }
            VALUE.set(line);
            context.write(KEY, VALUE);
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM