是否可以加快 PHP 中的递归文件扫描？

Question

I've been trying to replicate Gnu Find ("find.") in PHP, but it seems impossible to get even close to its speed.我一直在尝试在 PHP 中复制Gnu Find （“find.”），但似乎无法接近它的速度。 The PHP implementations use at least twice the time of Find. PHP 实现使用至少两倍的 Find 时间。 Are there faster ways of doing this with PHP? PHP 有更快的方法吗？

EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach编辑：我添加了一个使用 SPL 实现的代码示例——它的性能等于迭代方法

EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. EDIT2：当从 PHP 调用 find 时，它实际上比本地 PHP 实现慢。 I guess I should be satisfied with what I've got:)我想我应该对我所拥有的感到满意:)

// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) { 
  if ($dh = opendir($dir)) {
    while (false !== ($entry = readdir($dh))) {
      if ($entry == '.' || $entry == '..') continue;

      $path = "$dir/$entry";
      echo "$path\n";
      if (is_dir($path)) list_recursive($path);       
    }
    closedir($d);
  }
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
  $dirs = array($from);  
  while (NULL !== ($dir = array_pop($dirs))) {  
    if ($dh = opendir($dir)) {    
      while (false !== ($entry = readdir($dh))) {      
        if ($entry == '.' || $entry == '..') continue;        

        $path = "$dir/$entry";        
        echo "$path\n";        
        if (is_dir($path)) $dirs[] = $path;        
      }      
      closedir($dh);      
    }    
  }  
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
  $it = new RecursiveDirectoryIterator($path);
  foreach ($it as $file) {
    if ($file->isDot()) continue;

    echo $file->getPathname();
  }
}

// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) { 
  $dir = escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir", "r");
  while ('' != ($s = fread($h, 2048))) {
    echo $s;
  }
  pclose($h);
}

Answer 1

I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo` .我不确定性能是否更好，但您可以使用递归目录迭代器来简化您的代码...请参阅RecursiveDirectoryIterator和'SplFileInfo` 。

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}

Answer 2

Before you start changing anything, profile your code .在开始更改任何内容之前，请分析您的代码。

Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are.使用Xdebug之类的东西（加上 kcachegrind 以获得漂亮的图表）来找出慢速部分在哪里。 If you start changing things blindly, you won't get anywhere.如果你开始盲目地改变事情，你将一事无成。

My only other advice is to use the SPL directory iterators as posted already.我唯一的其他建议是使用已经发布的 SPL 目录迭代器。 Letting the internal C code do the work is almost always faster.让内部 C 代码完成工作几乎总是更快。

Answer 3

PHP just cannot perform as fast as C, plain and simple. PHP 的执行速度不如 C，简单明了。

Answer 4

Why would you expect the interpreted PHP code to be as fast as the compiled C version of find?为什么您希望解释的 PHP 代码与编译的 C 版本的 find 一样快？ Being only twice as slow is actually pretty good.只慢两倍实际上是相当不错的。

About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end.关于我要添加的唯一建议是在开始时执行 ob_start() 并在末尾执行 ob_get_contents()、ob_end_clean()。 That might speed things up.这可能会加快速度。

Answer 5

You're keeping N directory streams open where N is the depth of the directory tree.您保持打开 N 个目录流，其中 N 是目录树的深度。 Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries.相反，尝试一次读取整个目录的条目，然后遍历这些条目。 At the very least you'll maximize use of the desk I/O caches.至少您将最大限度地利用桌面 I/O 缓存。

Answer 6

Try using scandir() to read a whole directory at once, as Jason Cohen has suggested.正如 Jason Cohen 所建议的那样，尝试使用scandir()一次读取整个目录。 I've based the following code on code from the php manual comments for scandir()我将以下代码基于 php 对scandir()的手动注释中的代码

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }

Answer 7

You might want to seriously consider just using GNU find.您可能要认真考虑只使用 GNU find。 If it's available, and safe mode isn't turned on, you'll probably like the results just fine:如果它可用，并且未打开安全模式，您可能会喜欢结果：

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

However there might to be some directory that's so big, you're not going to want to bother with this either.但是，可能会有一些目录太大，您也不想为此烦恼。 Consider amortizing the slowness in other ways.考虑以其他方式摊销缓慢。 Your second try can be checkpointed (for example) by simply saving the directory stack in the session.您的第二次尝试可以通过简单地将目录堆栈保存在 session 中来设置检查点（例如）。 If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.如果您要为用户提供文件列表，只需收集一个页面，然后将 state 的 rest 保存在第 2 页的 session 中。

是否可以加快 PHP 中的递归文件扫描？

问题描述

7 个解决方案

解决方案1
4 2009-03-08 19:26:18

解决方案2
4

解决方案3
3 已采纳 2009-03-08 20:17:44

解决方案4
2 2009-03-08 19:30:33

解决方案5
1 2009-03-08 19:25:46

解决方案6
0 2009-10-28 15:33:04

解决方案7
0 2009-03-08 19:57:48

是否可以加快 PHP 中的递归文件扫描？

问题描述

7 个解决方案

解决方案1 4 2009-03-08 19:26:18

解决方案2 4

解决方案3 3 已采纳 2009-03-08 20:17:44

解决方案4 2 2009-03-08 19:30:33

解决方案5 1 2009-03-08 19:25:46

解决方案6 0 2009-10-28 15:33:04

解决方案7 0 2009-03-08 19:57:48

解决方案1
4 2009-03-08 19:26:18

解决方案2
4

解决方案3
3 已采纳 2009-03-08 20:17:44

解决方案4
2 2009-03-08 19:30:33

解决方案5
1 2009-03-08 19:25:46

解决方案6
0 2009-10-28 15:33:04

解决方案7
0 2009-03-08 19:57:48