简体   繁体   中英

How to count number of files under specific directory in hadoop?

I'm new to map-reduce framework. I want to find out the number of files under a specific directory by providing the name of that directory. eg Suppose we have 3 directories A, B, C and each one is having 20, 30, 40 part-r files respectively. So I'm interested in writing a hadoop job, which will count files/records in each directory ie I want an output in below formatted .txt file:

A is having 20 records

B is having 30 records

C is having 40 records

These all directories are present in HDFS.

The simplest/native approach is to use built in hdfs commands, in this case -count :

hdfs dfs -count /path/to/your/dir  >> output.txt

Or if you prefer a mixed approach via Linux commands:

hadoop fs -ls /path/to/your/dir/*  | wc -l >> output.txt

Finally the MapReduce version has already been answered here:

How do I count the number of files in HDFS from an MR job?

Code:

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}
System.out.println("The count is: " + count);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM