简体   繁体   中英

What is the fastest / easiest way to count large number of files in a directory (in Linux)?

I had some directory, with large number of files. Every time I tried to access the list of files within it, I was not able to do that or there was significant delay. I was trying to use ls command within command-line on Linux and web interface from my hosting provider did not help also.

The problem is, that when I just do ls , it takes significant amount of time to even start displaying something. Thus, ls | wc -l ls | wc -l would not help also.

After some research I came up with this code (in this example it counts number of new emails on some server):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

The above code is written in Python. I used Python's command-line tool and it worked pretty fast (returned result instantly).

I am interested in the answer to the following question: is it possible to count files in a directory (without subdirectories) faster? What is the fastest way to do that?

ls does a stat(2) call for every file. Other tools, like find(1) and the shell wildcard expansion, may avoid this call and just do readdir . One shell command combination that might work is find dir -maxdepth 1|wc -l , but it will gladly list the directory itself and miscount any filename with a newline in it.

From Python, the straight forward way to get just these names is os.listdir(directory) . Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.

Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.

Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.

Total number of files in the given directory

find . -maxdepth 1 -type f | wc -l

Total number of files in the given directory and all subdirectories under it

find . -type f | wc -l

For more details drop into a terminal and do man find

This should be pretty fast in Python:

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))

我认为ls在显示第一行之前花费了大部分时间,因为它必须对条目进行排序,因此ls -U应该更快地显示第一行(尽管它总体上可能没有那么好)。

The fastest way would be to avoid all the overhead of interpreted languages and write some code that directly addresses your problem. Doing so is difficult to do in a portable way, but pretty straightforward. At the moment I'm on an OS X box, but converting the following to Linux should be extremely straightforward. (I opted to ignore hidden files and only count regular files...modify as necessary or add command line switches to get the functionality you want.)

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}

My use case is a linux SBC (Banana Pi) counting files in a directory on a FAT32 USB stick. In a shell, doing

ls -U {dir} | wc -l

takes 6.4secs with 32k files in there (32k = max files/dir on FAT32) From python doing

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

takes only 0.874secs(!) Can't see anything else in Python being quicker than that.

I'm not sure about speed, but if you want to just use shell builtins this should work:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT

A shorter way of counting files in a directory in bash:

files=(*) ; echo ${#files[@]}

I generate 10_000 empty files in tmpfs; it takes 0.03s on my machine to count them, running ls | wc-l was just slightly slower (I flushed the cache before and in between just in case)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM