简体   繁体   中英

Hadoop MultipleInputs fails with RuntimeException

My Existing system read all files from a particular folder, and ran MapReduce on it. Code given below:

    Path path = new Path(inputPath)
    if (!FileSystem.get(conf).exists(path)) {
      System.out.println("Path does not exist (skipping): " + path);
      return 1;
    }
    FileInputFormat.setInputPaths(conf, inputPath);

This ran without any problem. Now, recent file changes required me to specify which files to use as input. I changed code to this:

for(String fileName:filePath.split(",")){
   MultipleInputs.addInputPath(conf, new Path(fileName), TextInputFormat.class, RawLogMapper.class);
   // MultipleInputs.addInputPath(conf, new Path(fileName), TextInputFormat.class);
}

where filePath is a comma-separated list of absolute file paths that need to be processed. I'm using mapred, not mapreduce.

import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.lib.MultipleInputs;
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;

Upon changing the code, I'm encountered with the following error:

14/09/08 13:50:05 INFO mapred.JobClient: Task Id : attempt_201408201501_1196_m_000000_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja

Not sure if it was because I didn't specify the TextInputFormat, I added that part to the addInputPath function, and the error remains.

Edit

Found the problem. There is a call made,further downstream in the mapper

String filename = conf.get("map.input.file");
pos = conf.get((new File(filename)).getName().split("-")[0]);

When I'm specifying the files, instead of giving the folder, the filename is returned null, and hence a NPE. I wonder why conf.get("map.input.file") when I specify input files.

https://issues.apache.org/jira/browse/MAPREDUCE-1743

Which means, I need to know the name of the file while running configuration:

  1. Without using conf.get("map.input.file")
  2. Without using ((FileSplit) context.getInputSplit()).getPath().toString(); , as I'm using mapred and not mapreduce.
 public void configure(JobConf conf) { String filename = conf.get("map.input.file"); merchant = conf.get((new File(filename)).getName().split("-")[0]); if (merchant == null) { merchant = "unknown_merchant"; } } 

Would appreciate any input to resolve this.

Thanks, Jeevan

Used

FileInputFormat.addInputPath(conf, new Path(fileName));

And all looks good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM