I'm new to map-reduce framework. I want to find out the number of files under a specific directory by providing the name of that directory. eg Suppose we have 3 directories A, B, C and each one is having 20, 30, 40 part-r files respectively. So I'm interested in writing a hadoop job, which will count files/records in each directory ie I want an output in below formatted .txt file:
A is having 20 records
B is having 30 records
C is having 40 records
These all directories are present in HDFS.
The simplest/native approach is to use built in hdfs commands, in this case -count
:
hdfs dfs -count /path/to/your/dir >> output.txt
Or if you prefer a mixed approach via Linux commands:
hadoop fs -ls /path/to/your/dir/* | wc -l >> output.txt
Finally the MapReduce version has already been answered here:
How do I count the number of files in HDFS from an MR job?
Code:
int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
count++;
ri.next();
}
System.out.println("The count is: " + count);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.