简体   繁体   中英

How do I count the number of files in HDFS from an MR job?

I'm new to Hadoop and Java for that matter. I'm trying to count the number of files in a folder on HDFS from the MapReduce driver I'm writing. I'd like to do this without calling the HDFS Shell as I want to be able to pass in the directory I use when I run the MapReduce job. I've tried a number of methods but have had no success in implementation due to my inexperience with Java.

Any help would be greatly appreciated.

Thanks,

Nomad.

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM