简体   繁体   中英

Getting the number of input files added to a Hadoop MR job

How do I get the number of input files I have added as part of the calls to FileInputFormat.addInputPath and FileInputFormat.addInputPaths. I am trying to add input files matching some pattern and in cases where no file matches the pattern and there are no input files for this MR job, I want to log a message to the user and not submit the job at all.

Thanks,

Venkat

FileInputFormat stores data in the Configuration variable called mapred.input.dir , so you can use the following:

Configuration conf = job.getConfiguration();
String dirs = conf.get("mapred.input.dir");
String[] arrDirs = dirs.split(",");
int numDirs = arrDirs.length;

The relevant part of the source code that does this is:

conf.set("mapred.input.dir", dirs == null ? dirStr : dirs + "," + dirStr);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM