简体   繁体   中英

scandir() in PHP far too slow

The target directory has 10 million+ text files. using $a = scandir() in a web page is deadly slow. Need array results in less than two seconds. Filtering does not work (scans the entire list too)

all I can think of is to use a perl or c program to preprocess and stuff x thousand file names from the target directory into a file, tag the filenames in the target dir picked with a .pi at the end (or something) and use php's file() function to get the list from the file instead.

I need to open and work with each file before it gets stuffed into a table. fyi. I can't wait more than 1-2 seconds for the array to work on to be available. Any assistance appreciated. Memory is not an issue. hdd space is not an issue, processor power is not an issue. the issue is getting a list in an array Fast while using a webpage front end. I can't wait because i am tired of waiting.

I tried using a brief fast c program with opendir and readdir but even it takes almost 4 minutes to scan the directory list. at least I could put a governor on it to limit to x files.

It seems the answer is to call the perl or c program which I can limit to x files and I can call this with system() or backticks . Then that list can be opened with file() ...OTF... makes sense?

The problem is less PHP and more the filesystem. Most filesystems do not work well with 10 million files in a single directory and performance starts to suffer badly. You're unlikely to get much better performance out of rewriting it in C or Perl because the filesystem is simply overwhelmed and its performance has gone pathological.

First, switch from scandir to opendir and readdir . This avoids having to make a 10 million element array. It also lets your program start doing work immediately before laboriously reading 10 million filenames.

if ($dh = opendir($dir)) {
    while (($file = readdir($dh)) !== false) {
        ...do your work...
    }
    closedir($dh);
}

Second, restructure your directory to have at least two levels of subdirectories based on the first letters of the filenames. For example, t/h/this.is.an.example . This will reduce the number of files in a single directory down to a level which the filesystem can better handle.

You can write a C program that calls the getdents syscall. Use a large buffer size, say 5MB, and skip entries with inode == 0 to dramatically improve performance.

Solutions that rely on libc readdir() are slow because it's limited to reading 32K chunks of directory entries at a time.

This approach is described on the Olark Developers Corner blog linked below.

References:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM