在Java Hadoop 2.2中的Map Reduce中对一系列值进行分组

Question

我有以下JSON格式的输入数据。

        "SeasonTicket": false, 
        "name": "Vinson Foreman", 
        "gender": "male", 
        "age": 50, 
        "email": "vinsonforeman@cyclonica.com", 
        "annualSalary": "$98501.00", 
        "id": 0

我需要根据薪水范围对值进行排序，即1000-10000、10000-25000，依此类推。

Range        Count 
1000-10000     10
10000-50000    20

我没有使用默认的JSON解析器或Jackson来处理数据，而是将其解析为String.I具有以下map和reduce函数。

地图功能

public class DemoMapper extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable v = new IntWritable(1);
    private Text k = new Text();

    @Override
    public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        try {
            if (line.contains("annualSalary")) {
                String s = line.replaceAll("$", "");
                String t = s.substring(26);
                Double x = Double.parseDouble(t);

                StringTokenizer itr = new StringTokenizer(t);
                while (itr.hasMoreTokens()) {
                    Double x = Double.parseDouble(s.substring(26));
                    if (x > 1000 && x < 10000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 10000 && x < 50000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 50000 && x < 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    }
                    output.collect(k, v);
                }

            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }
}

减少功能

public class DemoReducer extends MapReduceBase
        implements Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable count = new IntWritable();

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
                       OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            IntWritable value = (IntWritable) values.next();
            sum += value.get();
        }
        count.set(sum);
        output.collect(key, (IntWritable) count);
    }
}

如果可能的话，请告诉我不要使用JSON解析器将数据分组。

Answer 1

要按范围对数据进行分组，可以使用自定义分区程序示例。

在Java Hadoop 2.2中的Map Reduce中对一系列值进行分组

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-04-22 03:50:37

在Java Hadoop 2.2中的Map Reduce中对一系列值进行分组

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-04-22 03:50:37

解决方案1
1 已采纳 2014-04-22 03:50:37