简体   繁体   English

通过深度优先或宽度优先发现文件夹树

[英]Discovering folder tree via depth-first or breadth-first

I had to find the paths to the "deepest" folders in a folder. 我必须在文件夹中找到“最深”文件夹的路径。 For this I implemented two algorithms, and one is way faster than the other. 为此,我实现了两种算法,一种算法比另一种算法快。 Does anyone know why ? 有人知道为什么吗? I suppose this has some link with the hard-disk hardware but I'd like to understand. 我想这与硬盘硬件有一些联系,但我想了解。 Here is the fast one : 这是最快的一个:

    private function getHostAux($path) {
        $matches = array();
        $folder = rtrim($path, DIRECTORY_SEPARATOR);

        $moreFolders = glob($folder.DIRECTORY_SEPARATOR.'*', GLOB_ONLYDIR);
        if (count($moreFolders) == 0) {
           $matches[] = $folder;
        } else {
            foreach ($moreFolders as $fd) {
                $arr = $this->getHostAux($fd);
                $matches = array_merge($matches, $arr);
            }
        }
        return $matches;
    }

And here is the slow-one : 这是慢一号:

    /**
     * Breadth-first function using glob
     */
private function getHostAux($path) {
    $matches = array();
    $folders = array(rtrim($path, DIRECTORY_SEPARATOR));
    $i = 0;
    while($folder = array_shift($folders)) {
        $moreFolders = glob($folder.DIRECTORY_SEPARATOR.'*', GLOB_ONLYDIR);
        if (count($moreFolders == 0)) {
            $matches[$i] = $folder;
        }
        $folders = array_merge($folders, $moreFolders);
        $i++;
    }
    return $matches;
}

Thanks ! 谢谢 !

You haven't provided additional informations that might be crucial for understanding these "timings" which you observed. 您尚未提供其他信息,这些信息可能对于理解您观察到的这些“时机”至关重要。 (I intentionally wrote the quotes since you haven't specified what "slow" and "fast" mean and how exactly did you measure it.) (由于您没有指定“慢”和“快”的含义以及您如何精确地测量,我故意写了引号。)

Assuming that the supplied informations are true and that the speedup for the first method is greater than a couple of percent and you've tested it on directories of various sizes and depth... 假设提供的信息是正确的,并且第一种方法的加速大于百分之几,并且您已经在各种大小和深度的目录上对其进行了测试...

First I would like to comment on the supplied answers: 首先,我想对提供的答案发表评论:

  • I wouldn't be so sure about your answer. 我不太确定你的答案。 First I think you mean "kernel handles ". 首先,我认为您的意思是“内核处理 ”。 But this is not true since glob doesn't open handles. 但这是不正确的,因为glob不会打开句柄。 How did you come up with this answer? 您是怎么想到这个答案的?
  • Both versions have the same total iteration count. 两种版本的迭代总数相同。

And add something from myself: 并添加一些我自己的东西:

  • I would suspect array_shift() may cause the slowdown because it reindexes the whole array each time you call it. 我怀疑array_shift()可能会导致速度降低,因为每次调用它都会为整个数组重新索引。
  • The order in which you glob may matter depending on the underlying OS and file system. 全局顺序可能很重要,具体取决于基础操作系统和文件系统。
  • You have a bug (probably) in your code. 您的代码中(可能)有一个错误。 You increment $i after every glob and not after adding an element to the $matches array. 您在每个glob之后增加$i ,而不是在向$matches数组添加元素之后增加。 That causes that the $matches array is sparse which may cause the merging, shifting or even the adding process to be slower. 这导致$matches数组稀疏,这可能导致合并,移位甚至添加过程变慢。 I don't know exactly if that's the case with PHP but I know several languages in which arrays have these properties which are sometimes hard to keep in mind while coding. 我不确切知道PHP是否会出现这种情况,但是我知道数组具有这些属性的几种语言,有时在编码时很难记住这些属性。 I would recommend fixing this, timing the code again and seeing if that makes any difference. 我建议修复此问题,再次计时代码,看看是否有任何区别。

I think that your first algorithm with recursion does less iterations than the second one. 我认为您的第一个递归算法比第二个算法少迭代。 Try to watch how many iterations each algorithm does using auxilary variables. 尝试观察每个算法使用辅助变量进行多少次迭代。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM