无法在群集上运行MR

Question

I have an Map reduce program that is running successfully in standalone(Ecllipse) mode but while trying to run the same MR by exporting the jar in cluster. 我有一个Map reduce程序，该程序在独立（Ecllipse）模式下成功运行，但同时尝试通过在群集中导出jar来运行相同的MR。 It is showing null pointer exception like this, 它显示了这样的空指针异常，

  13/06/26 05:46:22 ERROR mypackage.HHDriver: Error while configuring run method. 
  java.lang.NullPointerException

I used the following code for run method. 我将以下代码用于run方法。

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
    Configuration configuration = new Configuration();
    Tool headOfHouseHold = new HHDriver();

    try {
        ToolRunner.run(configuration,headOfHouseHold,args);
    } catch (Exception exception) {
        exception.printStackTrace();
        LOGGER.error("Error while configuring run method", exception);
        // System.exit(1);
    }
}

run method: 运行方法：

if (args != null && args.length == 8) {
    // Setting the Configurations
    GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args);
    Configuration configuration=genericOptionsParser.getConfiguration();

    //Configuration configuration = new Configuration();

    configuration.set("fs.default.name", args[0]);
    configuration.set("mapred.job.tracker", args[1]);
    configuration.set("deltaFlag",args[2]);                                   
    configuration.set("keyPrefix",args[3]);
    configuration.set("outfileName",args[4]);
    configuration.set("Inpath",args[5]);
    String outputPath=args[6];

    configuration.set("mapred.map.tasks.speculative.execution", "false");
    configuration.set("mapred.reduce.tasks.speculative.execution", "false");

    // To avoid the creation of _LOG and _SUCCESS files
    configuration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
    configuration.set("hadoop.job.history.user.location", "none");
    configuration.set(Constants.MAX_NUM_REDUCERS,args[7]);

    // Configuration of the MR-Job
    Job job = new Job(configuration, "HH Job");
    job.setJarByClass(HHDriver.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setNumReduceTasks(HouseHoldingHelper.numReducer(configuration));
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    MultipleOutputs.addNamedOutput(job,configuration.get("outfileName"),
                                   TextOutputFormat.class,Text.class,Text.class);

    // Deletion of OutputTemp folder (if exists)
    FileSystem fileSystem = FileSystem.get(configuration);
    Path path = new Path(outputPath);

    if (path != null /*&& path.depth() >= 5*/) {
        fileSystem.delete(path, true);
    }

    // Deletion of empty files in the output (if exists)
    FileStatus[] fileStatus = fileSystem.listStatus(new Path(outputPath));
    for (FileStatus file : fileStatus) {
        if (file.getLen() == 0) {
            fileSystem.delete(file.getPath(), true);
        }
     }
    // Setting the Input/Output paths
    FileInputFormat.setInputPaths(job, new Path(configuration.get("Inpath")));
    FileOutputFormat.setOutputPath(job, new Path(outputPath));

    job.setMapperClass(HHMapper.class);
    job.setReducerClass(HHReducer.class);

    job.waitForCompletion(true);

    return job.waitForCompletion(true) ? 0 : 1;

I double checked the run method parameters those are not null and it is running in standalone mode as well.. 我再次检查了不为null的run方法参数，它也以独立模式运行。

Answer 1

Issue could be because the hadoop configuration is not properly getting passed to your program. 问题可能是因为hadoop配置未正确传递给您的程序。 You can try putting this in the beginning of your driver class: 您可以尝试将其放在驱动程序类的开头：

GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args[]);
Configuration hadoopConfiguration=genericOptionsParser.getConfiguration();

Then use the hadoopConfiguration object when initializing objects. 然后在初始化对象时使用hadoopConfiguration对象。

eg 例如

public int run(String[] args) throws Exception {        
    GenericOptionsParser genericOptionsParser=new GenericOptionsParser(args[]);
    Configuration hadoopConfiguration=genericOptionsParser.getConfiguration();

    Job job = new Job(hadoopConfiguration);
    //set other stuff
}

无法在群集上运行MR

问题描述

1 个解决方案

解决方案1
1 2013-06-26 10:09:09

无法在群集上运行MR

问题描述

1 个解决方案

解决方案1 1 2013-06-26 10:09:09

解决方案1
1 2013-06-26 10:09:09