[英]Grouping a range of values in Map Reduce in Java Hadoop 2.2
我有以下JSON格式的输入数据。
"SeasonTicket": false,
"name": "Vinson Foreman",
"gender": "male",
"age": 50,
"email": "vinsonforeman@cyclonica.com",
"annualSalary": "$98501.00",
"id": 0
我需要根据薪水范围对值进行排序,即1000-10000、10000-25000,依此类推。
Range Count
1000-10000 10
10000-50000 20
我没有使用默认的JSON解析器或Jackson来处理数据,而是将其解析为String.I具有以下map和reduce函数。
地图功能
public class DemoMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable v = new IntWritable(1);
private Text k = new Text();
@Override
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
try {
if (line.contains("annualSalary")) {
String s = line.replaceAll("$", "");
String t = s.substring(26);
Double x = Double.parseDouble(t);
StringTokenizer itr = new StringTokenizer(t);
while (itr.hasMoreTokens()) {
Double x = Double.parseDouble(s.substring(26));
if (x > 1000 && x < 10000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 10000 && x < 50000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 50000 && x < 100000) {
k.set(itr.nextToken());
output.collect(k, v);
} else if (x > 100000) {
k.set(itr.nextToken());
output.collect(k, v);
}
output.collect(k, v);
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
减少功能
public class DemoReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable count = new IntWritable();
@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
IntWritable value = (IntWritable) values.next();
sum += value.get();
}
count.set(sum);
output.collect(key, (IntWritable) count);
}
}
如果可能的话,请告诉我不要使用JSON解析器将数据分组。
要按范围对数据进行分组,可以使用自定义分区程序示例 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.