全局变量的值在 for 循环后不会改变

Question

I am developing a hadoop project.我正在开发一个 hadoop 项目。 I want to find customers in a certain day and then write those with the max consumption in that day.我想找到某一天的客户，然后写出当天消费量最大的那些。 In my reducer class, for some reason, the global variable max doesn't change it's value after a for loop.在我的 reducer 类中，由于某种原因，全局变量max在 for 循环后不会改变它的值。

EDIT I want to find the customers with max consumption in a certain day.编辑我想找到某一天消费量最大的客户。 I have managed to find the customers in the date I want, but I am facing a problem in my Reducer class.我已经设法在我想要的日期找到客户，但我在我的 Reducer 课程中遇到了问题。 Here is the code:这是代码：

EDIT #2 I already know that the values(consumption) are Natural numbers.编辑 #2我已经知道值（消耗）是自然数。 So in my output file I want to be only the customers, of a certain day, with max consumption.因此，在我的输出文件中，我只想成为某天消费量最大的客户。

EDIT #3 My input file is consisted of many data.编辑 #3我的输入文件由许多数据组成。 It has three columns;它有三列； the customer's id, the timestamp (yyyy-mm-DD HH:mm:ss) and the consumption客户的 id、时间戳 (yyyy-mm-DD HH:mm:ss) 和消费

Driver class驱动程序类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class alicanteDriver {

    public static void main(String[] args) throws Exception {
        long t_start = System.currentTimeMillis();
        long t_end;

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Alicante");
        job.setJarByClass(alicanteDriver.class);

        job.setMapperClass(alicanteMapperC.class);        

        //job.setCombinerClass(alicanteCombiner.class);

        job.setPartitionerClass(alicantePartitioner.class);

        job.setNumReduceTasks(2);

        job.setReducerClass(alicanteReducerC.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("/alicante_1y.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/alicante_output"));
        job.waitForCompletion(true);
        t_end = System.currentTimeMillis();

        System.out.println((t_end-t_start)/1000);
    }
 }

Mapper class映射器类

import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class alicanteMapperC extends
        Mapper<LongWritable, Text, Text, IntWritable> {

    String Customer = new String();
    SimpleDateFormat ft = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    Date t = new Date();
    IntWritable Consumption = new IntWritable();
    int counter = 0;

    // new vars
    int max = 0;

    @Override
    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        Date d2 = null;
        try {
            d2 = ft.parse("2013-07-01 01:00:00");
        } catch (ParseException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

        if (counter > 0) {

            String line = value.toString();
            StringTokenizer itr = new StringTokenizer(line, ",");

            while (itr.hasMoreTokens()) {
                Customer = itr.nextToken();
                try {
                    t = ft.parse(itr.nextToken());
                } catch (ParseException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
                Consumption.set(Integer.parseInt(itr.nextToken()));

                //sort out as many values as possible
                if(Consumption.get() > max) {
                    max = Consumption.get();
                }

                //find customers in a certain date
                if (t.compareTo(d2) == 0 && Consumption.get() == max) {
                    context.write(new Text(Customer), Consumption);
                }
            }
        }
        counter++;
    }
}

Reducer class减速机类

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import com.google.common.collect.Iterables;

public class alicanteReducerC extends
        Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {

        int max = 0; //this var

        // declaration of Lists
        List<Text> l1 = new ArrayList<Text>();
        List<IntWritable> l2 = new ArrayList<IntWritable>();

        for (IntWritable val : values) {
            if (val.get() > max) {
                max = val.get();
            }
            l1.add(key);
            l2.add(val);
        }

        for (int i = 0; i < l1.size(); i++) {
            if (l2.get(i).get() == max) {
                context.write(key, new IntWritable(max));
            }
        }
    }
}

Some values of the Input file输入文件的一些值

C11FA586148,2013-07-01 01:00:00,3
C11FA586152,2015-09-01 15:22:22,3
C11FA586168,2015-02-01 15:22:22,1
C11FA586258,2013-07-01 01:00:00,5
C11FA586413,2013-07-01 01:00:00,5
C11UA487446,2013-09-01 15:22:22,3
C11UA487446,2013-07-01 01:00:00,3
C11FA586148,2013-07-01 01:00:00,4

Output should be输出应该是

C11FA586258 5
C11FA586413 5

I have searched the forums for a couple of hours, and still can't find the issue.我在论坛上搜索了几个小时，仍然找不到问题。 Any ideas?有任何想法吗？

Answer 1

here is the refactored code: you can pass/change specific value for date of consumption.这是重构后的代码：您可以传递/更改消费日期的特定值。 In this case you don't need reducer.在这种情况下，您不需要减速器。 my first answer was to query max comsumption from input, and this answer is to query user provided consumption from input.我的第一个答案是从输入中查询最大消耗，这个答案是从输入中查询用户提供的消耗。
setup method will get user provided value for mapper.maxConsumption.date and pass them to map method. setup方法将获取用户提供的mapper.maxConsumption.date值并将它们传递给map方法。
cleaup method in reducer scans all max consumption customers and writes final max in input (ie, 5 in this case) - see screen shot for detail execution log: cleaup方法扫描所有最大消费客户并在输入中写入最终最大值（即，在这种情况下为 5） - 有关详细执行日志，请参见屏幕截图：

run as:运行为：

hadoop jar maxConsumption.jar -Dmapper.maxConsumption.date="2013-07-01 01:00:00" Data/input.txt output/maxConsupmtion5

#input:
C11FA586148,2013-07-01 01:00:00,3
C11FA586152,2015-09-01 15:22:22,3
C11FA586168,2015-02-01 15:22:22,1
C11FA586258,2013-07-01 01:00:00,5
C11FA586413,2013-07-01 01:00:00,5
C11UA487446,2013-09-01 15:22:22,3
C11UA487446,2013-07-01 01:00:00,3
C11FA586148,2013-07-01 01:00:00,4

#output:
C11FA586258 5
C11FA586413 5

public class maxConsumption  extends Configured implements Tool{

    public static class DataMapper extends Mapper<Object, Text, Text, IntWritable> {
        SimpleDateFormat ft = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date dateInFile, filterDate;
        int lineno=0;
        private final static Text customer = new Text();
        private final static IntWritable consumption = new IntWritable();
        private final static Text maxConsumptionDate = new Text();

        public void setup(Context context) {
            Configuration config = context.getConfiguration();
            maxConsumptionDate.set(config.get("mapper.maxConsumption.date"));
        }

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
            try{
                lineno++;
                filterDate = ft.parse(maxConsumptionDate.toString());
                //map data from line/file
                String[] fields = value.toString().split(",");
                customer.set(fields[0].trim());
                dateInFile = ft.parse(fields[1].trim());
                consumption.set(Integer.parseInt(fields[2].trim()));

                if(dateInFile.equals(filterDate)) //only send to reducer if date filter matches....
                    context.write(new Text(customer), consumption);
            }catch(Exception e){
                System.err.println("Invaid Data at line: " + lineno + " Error: " + e.getMessage());
            }
        }   
    }

    public  static class DataReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        LinkedHashMap<String, Integer> maxConsumption = new LinkedHashMap<String,Integer>();
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) 
                throws IOException, InterruptedException {
            int max=0;
            System.out.print("reducer received: " + key + " [ ");
            for(IntWritable value: values){
                System.out.print( value.get() + " ");
                if(value.get() > max)
                    max=value.get();
            }
            System.out.println( " ]");
            System.out.println(key.toString() + "    max is   " + max);
            maxConsumption.put(key.toString(), max);
        }

        @Override
        protected void cleanup(Context context)
                throws IOException, InterruptedException {
            int max=0;
            //first find the max from reducer
            for (String key : maxConsumption.keySet()){
                 System.out.println("cleaup customer : " + key.toString() + " consumption : " + maxConsumption.get(key) 
                         + " max: " + max); 
                if(maxConsumption.get(key) > max)
                    max=maxConsumption.get(key);
            }

            System.out.println("final max is: " + max);
            //write only the max value from map
            for (String key : maxConsumption.keySet()){
                if(maxConsumption.get(key) == max)
                    context.write(new Text(key), new IntWritable(maxConsumption.get(key)));
            }
        }
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new maxConsumption(), args);
        System.exit(res);
    }

    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: -Dmapper.maxConsumption.date=\"2013-07-01 01:00:00\" <in> <out>");
            System.exit(2);
        }
        Configuration conf = this.getConf();
        Job job = Job.getInstance(conf, "get-max-consumption");
        job.setJarByClass(maxConsumption.class);
        job.setMapperClass(DataMapper.class);
        job.setReducerClass(DataReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        FileSystem fs = null;
        Path dstFilePath = new Path(args[1]);
        try {
            fs = dstFilePath.getFileSystem(conf);
            if (fs.exists(dstFilePath))
                fs.delete(dstFilePath, true);
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        return job.waitForCompletion(true) ? 0 : 1;
    } 
}

Answer 2

Probably all the values that go to your reducer are under 0. Try min value to identify if you variable change.可能所有进入您的减速器的值都在 0 以下。尝试使用 min 值来确定您的变量是否发生变化。

max = MIN_VALUE;

Based on what you say, the output should be only 0's (in this the max value in the reducers are 0) or no output (all values less than 0).根据您的说法，输出应仅为 0（在此减速器中的最大值为 0）或无输出（所有值均小于 0）。 Also, look this还有，看这个

context.write(key, new IntWritable());

it should be它应该是

context.write(key, new IntWritable(max));

EDIT: I just saw your Mapper class, it has a lot of problems.编辑：我刚刚看到你的 Mapper 类，它有很多问题。 the following code is skiping the first element in every mapper.以下代码跳过每个映射器中的第一个元素。 why?为什么？

    if (counter > 0) {

I guess, that you are getting something like this right?我猜，你得到这样的东西对吗？ "customer, 2013-07-01 01:00:00, 2,..." if that is the case and you are already filtering values, you should declare your max variable as local, not in the mapper scope, it would affect multiple customers. "customer, 2013-07-01 01:00:00, 2,..." 如果是这种情况并且您已经在过滤值，您应该将 max 变量声明为本地变量，而不是在映射器范围内，它会影响多个客户。

There are a lot of questions around this.. you could explain your input for each mapper and what you want to do.关于这个有很多问题..你可以解释你对每个映射器的输入以及你想要做什么。

EDIT2: Based on your answer I would try this EDIT2：根据你的回答，我会试试这个

import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class AlicanteMapperC extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final int max = 5;
    private SimpleDateFormat ft = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    @Override
    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        Date t = null;
        String[] line = value.toString().split(",");
        String customer = line[0];
        try {
            t = ft.parse(line[1]);
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            throw new RuntimeException("something wrong with the date!" + line[1]);
        }
        Integer consumption = Integer.parseInt(line[2]);

        //find customers in a certain date
        if (t.compareTo(ft.parse("2013-07-01 01:00:00")) == 0 && consumption == max) {
            context.write(new Text(customer), new IntWritable(consumption));
        }
        counter++;
    }
}

and reducer pretty simple to emit 1 record per customer和 reducer 非常简单地为每个客户发出 1 条记录

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import com.google.common.collect.Iterables;

public class AlicanteReducerC extends
        Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {

        //We already now that it is 5
         context.write(key, new IntWritable(5));
         //If you want something different, for example report customer with different values, you could iterate over the iterator like this 
         //for (IntWritable val : values) {
           // context.write(key, new IntWritable(val));
        //}      
    }
}

全局变量的值在 for 循环后不会改变

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-01-13 15:27:00

解决方案2
0 2017-01-13 13:46:44

全局变量的值在 for 循环后不会改变

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-01-13 15:27:00

解决方案2 0 2017-01-13 13:46:44

解决方案1
1 已采纳 2017-01-13 15:27:00

解决方案2
0 2017-01-13 13:46:44