繁体   English   中英

在Java Hadoop 2.2中的Map Reduce中对一系列值进行分组

[英]Grouping a range of values in Map Reduce in Java Hadoop 2.2

我有以下JSON格式的输入数据。

        "SeasonTicket": false, 
        "name": "Vinson Foreman", 
        "gender": "male", 
        "age": 50, 
        "email": "vinsonforeman@cyclonica.com", 
        "annualSalary": "$98501.00", 
        "id": 0

我需要根据薪水范围对值进行排序,即1000-10000、10000-25000,依此类推。

Range        Count 
1000-10000     10
10000-50000    20

我没有使用默认的JSON解析器或Jackson来处理数据,而是将其解析为String.I具有以下map和reduce函数。

地图功能

public class DemoMapper extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable v = new IntWritable(1);
    private Text k = new Text();

    @Override
    public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        try {
            if (line.contains("annualSalary")) {
                String s = line.replaceAll("$", "");
                String t = s.substring(26);
                Double x = Double.parseDouble(t);

                StringTokenizer itr = new StringTokenizer(t);
                while (itr.hasMoreTokens()) {
                    Double x = Double.parseDouble(s.substring(26));
                    if (x > 1000 && x < 10000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 10000 && x < 50000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 50000 && x < 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    }
                    output.collect(k, v);
                }

            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }
}

减少功能

public class DemoReducer extends MapReduceBase
        implements Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable count = new IntWritable();

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
                       OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            IntWritable value = (IntWritable) values.next();
            sum += value.get();
        }
        count.set(sum);
        output.collect(key, (IntWritable) count);
    }
}

如果可能的话,请告诉我不要使用JSON解析器将数据分组。

要按范围对数据进行分组,可以使用自定义分区程序示例

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM