简体   繁体   English

PHP中的scandir()太慢了

[英]scandir() in PHP far too slow

The target directory has 10 million+ text files. 目标目录具有1000万个以上的文本文件。 using $a = scandir() in a web page is deadly slow. 在网页中using $a = scandir()致命。 Need array results in less than two seconds. Need阵列的结果不到两秒钟。 Filtering does not work (scans the entire list too) 过滤不起作用(也扫描整个列表)

all I can think of is to use a perl or c program to preprocess and stuff x thousand file names from the target directory into a file, tag the filenames in the target dir picked with a .pi at the end (or something) and use php's file() function to get the list from the file instead. 我所能想到的就是使用perlc程序对目标目录中的数千个文件名进行预处理并将其填充到文件中,在目标目录中标记文件名,并在末尾用.pi (或类似内容),然后使用php的file()函数可从文件中获取列表。

I need to open and work with each file before it gets stuffed into a table. 在将每个文件塞入表之前,我需要打开并使用每个文件。 fyi. 飞。 I can't wait more than 1-2 seconds for the array to work on to be available. 我迫不及待地等待了1-2秒钟,阵列才能继续工作。 Any assistance appreciated. 任何帮助表示赞赏。 Memory is not an issue. 内存不是问题。 hdd space is not an issue, processor power is not an issue. hdd空间不是问题,处理器能力也不是问题。 the issue is getting a list in an array Fast while using a webpage front end. 问题是在使用网页前端时以快速数组的形式获取列表。 I can't wait because i am tired of waiting. 我等不及了,因为我厌倦了等待。

I tried using a brief fast c program with opendir and readdir but even it takes almost 4 minutes to scan the directory list. 我尝试使用带有opendirreaddir的简短快速C程序,但是即使扫描目录列表也要花费近4分钟。 at least I could put a governor on it to limit to x files. 至少我可以在上面放置一个调控器以限制x个文件。

It seems the answer is to call the perl or c program which I can limit to x files and I can call this with system() or backticks . 似乎答案是调用perlc程序,我可以将其限制为x个文件,并且可以使用system()backticks来调用它。 Then that list can be opened with file() ...OTF... makes sense? 然后可以使用file() ... OTF ...打开该列表吗?

The problem is less PHP and more the filesystem. 问题是更少的PHP而更多的文件系统。 Most filesystems do not work well with 10 million files in a single directory and performance starts to suffer badly. 大多数文件系统不能在单个目录中处理1000万个文件,并且性能开始严重下降。 You're unlikely to get much better performance out of rewriting it in C or Perl because the filesystem is simply overwhelmed and its performance has gone pathological. 通过用C或Perl重写它,您不太可能获得更好的性能,因为文件系统简直不堪重负,并且它的性能已变得毫无用处。

First, switch from scandir to opendir and readdir . 首先,从scandir切换到opendirreaddir This avoids having to make a 10 million element array. 这避免了必须制作1000万个元素数组。 It also lets your program start doing work immediately before laboriously reading 10 million filenames. 它还使您的程序可以立即开始工作,然后再努力读取1000万个文件名。

if ($dh = opendir($dir)) {
    while (($file = readdir($dh)) !== false) {
        ...do your work...
    }
    closedir($dh);
}

Second, restructure your directory to have at least two levels of subdirectories based on the first letters of the filenames. 其次,根据文件名的首字母将目录重构为至少具有两个级别的子目录。 For example, t/h/this.is.an.example . 例如, t/h/this.is.an.example This will reduce the number of files in a single directory down to a level which the filesystem can better handle. 这会将单个目录中的文件数量减少到文件系统可以更好地处理的水平。

You can write a C program that calls the getdents syscall. 您可以编写一个调用getdents syscall的C程序。 Use a large buffer size, say 5MB, and skip entries with inode == 0 to dramatically improve performance. 使用较大的缓冲区大小(例如5MB),并跳过inode == 0的条目以显着提高性能。

Solutions that rely on libc readdir() are slow because it's limited to reading 32K chunks of directory entries at a time. 依赖libc readdir()解决方案速度很慢,因为它只能一次读取32K块目录条目。

This approach is described on the Olark Developers Corner blog linked below. 下面链接的Olark Developers Corner博客中介绍了这种方法。

References: 参考文献:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM