How to count number of files under specific directory in hadoop?

Question

I'm new to map-reduce framework. I want to find out the number of files under a specific directory by providing the name of that directory. eg Suppose we have 3 directories A, B, C and each one is having 20, 30, 40 part-r files respectively. So I'm interested in writing a hadoop job, which will count files/records in each directory ie I want an output in below formatted .txt file:

A is having 20 records

B is having 30 records

C is having 40 records

These all directories are present in HDFS.

Answer 1

The simplest/native approach is to use built in hdfs commands, in this case -count :

hdfs dfs -count /path/to/your/dir  >> output.txt

Or if you prefer a mixed approach via Linux commands:

hadoop fs -ls /path/to/your/dir/*  | wc -l >> output.txt

Finally the MapReduce version has already been answered here:

How do I count the number of files in HDFS from an MR job?

Code:

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}
System.out.println("The count is: " + count);

How to count number of files under specific directory in hadoop?

Question

1 answers

solution1
1 2017-09-21 21:17:11

How to count number of files under specific directory in hadoop?

Question

1 answers

solution1 1 2017-09-21 21:17:11

solution1
1 2017-09-21 21:17:11