简体   繁体   English

Hadoop-减速器未启动

[英]Hadoop - a reducer is not being initiated

I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). 我正在尝试在Hadoop 2.6.0上针对单节点集群运行开源kNN join MapReduce hbrj算法-便携式计算机(OSX)上安装了伪分布式操作。 This is the code. 这是代码。

Mapper, reducer and the main driver: 映射器,Reducer和主驱动程序:

public class RPhase2 extends Configured implements Tool 
{
    public static class MapClass extends MapReduceBase 
    implements Mapper<LongWritable, Text, IntWritable, RPhase2Value> 
    {
        public void map(LongWritable key, Text value, 
        OutputCollector<IntWritable, RPhase2Value> output, 
        Reporter reporter)  throws IOException 
        {
            String line = value.toString();
            String[] parts = line.split(" +");
            // key format <rid1>
            IntWritable mapKey = new IntWritable(Integer.valueOf(parts[0]));
            // value format <rid2, dist>
            RPhase2Value np2v = new RPhase2Value(Integer.valueOf(parts[1]), Float.valueOf(parts[2]));
            System.out.println("############### key:  " + mapKey.toString() + "   np2v:  " + np2v.toString());
            output.collect(mapKey, np2v);
        }
    }

    public static class Reduce extends MapReduceBase
    implements Reducer<IntWritable, RPhase2Value, NullWritable, Text> 
    {
        int numberOfPartition;  
        int knn;

        class Record {...}

        class RecordComparator implements Comparator<Record> {...}

        public void configure(JobConf job) 
        {
            numberOfPartition = job.getInt("numberOfPartition", 2); 
            knn = job.getInt("knn", 3);
            System.out.println("########## configuring!");
        }   

        public void reduce(IntWritable key, Iterator<RPhase2Value> values, 
        OutputCollector<NullWritable, Text> output, 
        Reporter reporter) throws IOException 
        {
            //initialize the pq
            RecordComparator rc = new RecordComparator();
            PriorityQueue<Record> pq = new PriorityQueue<Record>(knn + 1, rc);

            System.out.println("Phase 2 is at reduce");
            System.out.println("########## key: " + key.toString());

            // For each record we have a reduce task
            // value format <rid1, rid2, dist>
            while (values.hasNext()) 
            {
                RPhase2Value np2v = values.next();

                int id2 = np2v.getFirst().get();
                float dist = np2v.getSecond().get();
                Record record = new Record(id2, dist);
                pq.add(record);
                if (pq.size() > knn)
                    pq.poll();
            }

            while(pq.size() > 0) 
            {
                output.collect(NullWritable.get(), new Text(key.toString() + " " + pq.poll().toString()));
                //break; // only ouput the first record
            }

        } // reduce
    } // Reducer

    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(getConf(), RPhase2.class);
        conf.setJobName("RPhase2");

        conf.setMapOutputKeyClass(IntWritable.class);
        conf.setMapOutputValueClass(RPhase2Value.class);

        conf.setOutputKeyClass(NullWritable.class);
        conf.setOutputValueClass(Text.class);   

        conf.setMapperClass(MapClass.class);        
        conf.setReducerClass(Reduce.class);

        int numberOfPartition = 0;  
        List<String> other_args = new ArrayList<String>();

        for(int i = 0; i < args.length; ++i) 
        {
            try {
                if ("-m".equals(args[i])) {
                    //conf.setNumMapTasks(Integer.parseInt(args[++i]));
                    ++i;
                } else if ("-r".equals(args[i])) {
                    conf.setNumReduceTasks(Integer.parseInt(args[++i]));
                } else if ("-p".equals(args[i])) {
                    numberOfPartition = Integer.parseInt(args[++i]);
                    conf.setInt("numberOfPartition", numberOfPartition);
                } else if ("-k".equals(args[i])) {
                    int knn = Integer.parseInt(args[++i]);
                    conf.setInt("knn", knn);
                    System.out.println(knn + "~ hi");
                } else {
                    other_args.add(args[i]);
                }
                conf.setNumReduceTasks(numberOfPartition * numberOfPartition);
                //conf.setNumReduceTasks(1);
            } catch (NumberFormatException except) {
                System.out.println("ERROR: Integer expected instead of " + args[i]);
                return printUsage();
            } catch (ArrayIndexOutOfBoundsException except) {
                System.out.println("ERROR: Required parameter missing from " + args[i-1]);
                return printUsage();
            }
        } 


        FileInputFormat.setInputPaths(conf, other_args.get(0));
        FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));

        JobClient.runJob(conf);
        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new RPhase2(), args);
    }
} // RPhase2

When I run this the mapper is successful but the job terminates suddenly, and the reducer never instantiated. 当我运行它时,映射器成功,但是作业突然终止,并且reducer从未实例化。 Moreover, no errors are ever printed (even in the log files). 此外,不会打印任何错误(即使在日志文件中也是如此)。 I know that also because the print statements in the configuration of the Reducer never get printed. 我知道这也是因为Reducer配置中的打印语句永远不会被打印。 Output: 输出:

15/06/15 14:00:37 INFO mapred.LocalJobRunner: map task executor complete.
15/06/15 14:00:38 INFO mapreduce.Job:  map 100% reduce 0%
15/06/15 14:00:38 INFO mapreduce.Job: Job job_local833125918_0001 completed successfully
15/06/15 14:00:38 INFO mapreduce.Job: Counters: 20
    File System Counters
        FILE: Number of bytes read=12505456
        FILE: Number of bytes written=14977422
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=11408
        HDFS: Number of bytes written=8724
        HDFS: Number of read operations=216
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=99
    Map-Reduce Framework
        Map input records=60
        Map output records=60
        Input split bytes=963
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=14
        Total committed heap usage (bytes)=1717567488
    File Input Format Counters 
        Bytes Read=2153
    File Output Format Counters 
        Bytes Written=1645

What I have done so far: 到目前为止,我所做的是:

  • I have been looking at similar questions, and I found the most frequent problem is not configuring the output formats when the output of the mapper and reducer are different which is done in the code above: conf.setMapOutputKeyClass(Class); 我一直在研究类似的问题,并且发现最常见的问题是,当映射器和化简器的输出不同时,没有配置输出格式,这是在上面的代码中完成的:conf.setMapOutputKeyClass(Class); conf.setMapOutputValueClass(Class); conf.setMapOutputValueClass(类);

  • In another post I found a suggestion to change reduce(..., Iterator <...>, ...) to (..., Iterable <...>, ...) which gave me trouble compiling. 在另一篇文章中,我发现了一个将reduce(...,Iterator <...>,...)更改为(...,Iterable <...>,...)的建议,这给我编译带来了麻烦。 I could no longer use .getNext() and .next() methods as well as got this error: 我再也无法使用.getNext()和.next()方法,并且收到此错误消息:

    error: Reduce is not abstract and does not override abstract method reduce(IntWritable,Iterator,OutputCollector,Reporter) in Reducer 错误:Reduce不是抽象的,并且未覆盖Reducer中的抽象方法reduce(IntWritable,Iterator,OutputCollector,Reporter)

If anyone has any hints or suggestions on what I can try to find what the issue is I would be very appreciative! 如果有人对我有什么建议或建议,可以尝试找出问题所在,我将不胜感激!

Just a note that I have posted a question about my problem before in here ( Hadoop kNN join algorithm stuck at map 100% reduce 0% ) but it did not get enough attention so I wanted to re-ask this from a different perspective. 只是我之前在这里发布了一个有关我的问题的问题( Hadoop kNN join算法卡在地图上100%减少了0% ),但是它没有引起足够的重视,所以我想从另一个角度重新提出这个问题。 You could use this link for more details on my log files. 您可以使用此链接获取有关我的日志文件的更多详细信息。

I have figured out the problem and it was something silly. 我已经解决了问题,这很愚蠢。 If you notice in the code above, numberOfPartition is set to 0 before the arguments are read, and the number of reducers are set to numberOfPartition * numberOfPartition. 如果您在上面的代码中注意到,则在读取参数之前numberOfPartition设置为0,而reducer的数量设置为numberOfPartition * numberOfPartition。 I, as the user did not change the number of partitions parameter (mostly because I simply copy pasted the argument line from their provided README) so that's why the reducer never even started. 我,因为用户没有更改partitions参数的数量(主要是因为我只是从提供的README中复制粘贴了参数行),所以这就是为什么reducer从未启动的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM