简体   繁体   English

使用Hadoop Map Reduce执行多种计算

[英]Performing multiple computations with Hadoop Map Reduce

I have a map reduce program for finding the min/max for 2 separate properties for each year. 我有一个map reduce程序,用于查找每年2个单独属性的最小值/最大值。 This works, for the most part, using a single node cluster in hadoop. 在大多数情况下,这可以在hadoop中使用单个节点集群。 Here is my currently setup: 这是我当前的设置:

public class MaxTemperatureReducer extends
        Reducer<Text, Stats, Text, Stats> {

    private Stats result = new Stats();

    @Override
    public void reduce(Text key, Iterable<Stats> values, Context context)
            throws IOException, InterruptedException {

        int maxValue = Integer.MIN_VALUE;
        int minValue = Integer.MAX_VALUE;
        int sum = 0;

        for (Stats value : values) {
            result.setMaxTemp(Math.max(maxValue, value.getMaxTemp()));
            result.setMinTemp(Math.min(minValue, value.getMinTemp()));
            result.setMaxWind(Math.max(maxValue, value.getMaxWind()));
            result.setMinWind(Math.min(minValue, value.getMinWind()));

            sum += value.getCount();
        }

        result.setCount(sum);

        context.write(key, result);
    }
}

public class MaxTemperatureMapper extends
        Mapper<Object, Text, Text, Stats> {

    private static final int MISSING = 9999;
    private Stats outStat = new Stats();

    @Override
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {

        String[] split = value.toString().split("\\s+");
        String year = split[2].substring(0, 4);
        int airTemperature;
        airTemperature = (int) Float.parseFloat(split[3]);

        outStat.setMinTemp((float)airTemperature);
        outStat.setMaxTemp((float)airTemperature);

        outStat.setMinWind(Float.parseFloat(split[12]));
        outStat.setMaxWind(Float.parseFloat(split[14]));
        outStat.setCount(1);

        context.write(new Text(year), outStat);
    }
}

public class MaxTemperatureDriver extends Configured implements Tool {
    public int run(String[] args) throws Exception {

        if (args.length != 2) {
            System.err
                    .println("Usage: MaxTemperatureDriver <input path> <outputpath>");
            System.exit(-1);
        }

        Job job = new Job();
        job.setJarByClass(MaxTemperatureDriver.class);
        job.setJobName("Max Temperature");

        job.setMapperClass(MaxTemperatureMapper.class);
        job.setCombinerClass(MaxTemperatureReducer.class);
        job.setReducerClass(MaxTemperatureReducer.class);



        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Stats.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
        boolean success = job.waitForCompletion(true);
        return success ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        MaxTemperatureDriver driver = new MaxTemperatureDriver();
        int exitCode = ToolRunner.run(driver, args);
        System.exit(exitCode);

    }
 }

Currently it only prints the Min/Max for the temp and windspeed for each year. 目前,它仅打印每年的温度和风速的最小值/最大值。 I am sure it is a simple implementation but cannot find a answer anywhere. 我确信这是一个简单的实现,但是在任何地方都找不到答案。 I want to try and find the top 5 min/max for each year. 我想尝试找出每年最高的5分钟/最大值。 Any suggestions? 有什么建议么?

Let me assume the following signature for your Stats class. 让我假设您的Stats类具有以下签名。

/* the stats class need to be a writable, the below is just a demo*/
public class Stats {

public float getTemp() {
    return temp;
}
public void setTemp(float temp) {
    this.temp = temp;
}
public float getWind() {
    return wind;
}
public void setWind(float wind) {
    this.wind = wind;
}
private float temp;
private float wind;
}

With this, let us change the reducer as below. 这样,让我们​​如下更改减速器。

SortedSet<Float> tempSetMax = new TreeSet<Float>();
        SortedSet<Float> tempSetMin = new TreeSet<Float>();
        SortedSet<Float> windSetMin = new TreeSet<Float>();
        SortedSet<Float> windSetMax = new TreeSet<Float>();
        List<Stats> values = new ArrayList<Float>();
        for (Stats value : values) {

            float temp = value.getTemp();
            float wind = value.getWind();

            if (tempSetMax.size() < 5) {
                tempSetMax.add(temp);
            } else {
                float currentMinValue = tempSetMax.first();
                if (temp > currentMinValue) {
                    tempSetMax.remove(currentMinValue);
                    tempSetMax.add(temp);
                }
            }
            if (tempSetMin.size() < 5) {
                tempSetMin.add(temp);
            } else {
                float currentMaxValue = tempSetMin.last();
                if (temp < currentMaxValue) {
                    tempSetMax.remove(currentMaxValue);
                    tempSetMax.add(temp);
                }
            }

            if (windSetMin.size() < 5) {
                windSetMin.add(wind);
            } else {

                float currentMaxValue = windSetMin.last();
                if (wind < currentMaxValue) {
                    windSetMin.remove(currentMaxValue);
                    windSetMin.add(temp);
                }

            }
            if (windSetMax.size() < 5) {
                windSetMax.add(wind);
            } else {

                float currentMinValue = windSetMax.first();
                if (wind > currentMinValue) {
                    windSetMax.remove(currentMinValue);
                    windSetMax.add(temp);
                }

            }
        }

Now you can write to context the toString() of each list, or you can create a custom writable. 现在,您可以将每个列表的toString()写入上下文,或者可以创建自定义可写代码。 In my code, please change the Stats according to your requirement. 在我的代码中,请根据您的要求更改Stats It needs to be a Writable . 它必须是Writable The above is just for demonstrating the example flow. 上面仅用于演示示例流程。

是MR Design Patterns Book中获得前10名的代码。在同一GitHub位置中,还有其他MR设计模式的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM