排除 os.walk 中的目录

Question

I'm writing a script that descends into a directory tree (using os.walk()) and then visits each file matching a certain file extension.我正在编写一个进入目录树的脚本（使用 os.walk()），然后访问与某个文件扩展名匹配的每个文件。 However, since some of the directory trees that my tool will be used on also contain sub directories that in turn contain a LOT of useless (for the purpose of this script) stuff, I figured I'd add an option for the user to specify a list of directories to exclude from the traversal.但是，由于我的工具将用于的一些目录树还包含子目录，而这些子目录又包含很多无用的（为了这个脚本的目的）的东西，我想我会添加一个选项供用户指定要从遍历中排除的目录列表。

This is easy enough with os.walk().使用 os.walk() 这很容易。 After all, it's up to me to decide whether I actually want to visit the respective files / dirs yielded by os.walk() or just skip them.毕竟，由我决定是否真的要访问 os.walk() 生成的相应文件/目录，或者只是跳过它们。 The problem is that if I have, for example, a directory tree like this:问题是，例如，如果我有这样的目录树：

root--
     |
     --- dirA
     |
     --- dirB
     |
     --- uselessStuff --
                       |
                       --- moreJunk
                       |
                       --- yetMoreJunk

and I want to exclude uselessStuff and all its children, os.walk() will still descend into all the (potentially thousands of) sub directories of uselessStuff , which, needless to say, slows things down a lot.我想排除uselessStuff及其所有子目录， os.walk() 仍然会下降到uselessStuff 的所有（可能有数千个）子目录中，不用说，这会大大减慢速度。 In an ideal world, I could tell os.walk() to not even bother yielding any more children of uselessStuff , but to my knowledge there is no way of doing that (is there?).在理想的世界中，我可以告诉 os.walk() 甚至不要再产生uselessStuff 的孩子，但据我所知，没有办法做到这一点（有吗？）。

Does anyone have an idea?有没有人有想法？ Maybe there's a third-party library that provides something like that?也许有一个第三方库可以提供类似的东西？

Answer 1

Modifying dirs in-place will prune the (subsequent) files and directories visited by os.walk :就地修改dirs将修剪os.walk访问的（后续）文件和目录：

# exclude = set(['New folder', 'Windows', 'Desktop'])
for root, dirs, files in os.walk(top, topdown=True):
    dirs[:] = [d for d in dirs if d not in exclude]

From help(os.walk):来自帮助（os.walk）：

When topdown is true, the caller can modify the dirnames list in-place (eg, via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames;当 topdown 为 true 时，调用者可以就地修改 dirnames 列表（例如，通过 del 或 slice 赋值），而 walk 只会递归到名称保留在 dirnames 中的子目录； this can be used to prune the search...这可用于修剪搜索...

Answer 2

... an alternative form of @unutbu's excellent answer that reads a little more directly, given that the intent is to exclude directories, at the cost of O(n**2) vs O(n) time. ... @unutbu 优秀答案的另一种形式，读起来更直接，因为目的是排除目录，代价是 O(n**2) 与 O(n) 时间。

(Making a copy of the dirs list with list(dirs) is required for correct execution) （正确执行需要使用list(dirs)制作目录列表的副本）

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    [dirs.remove(d) for d in list(dirs) if d in exclude]

排除 os.walk 中的目录

问题描述

2 个解决方案

解决方案1
271 已采纳 2013-11-08 13:10:41

解决方案2
8 2016-05-17 05:16:44

排除 os.walk 中的目录

问题描述

2 个解决方案

解决方案1 271 已采纳 2013-11-08 13:10:41

解决方案2 8 2016-05-17 05:16:44

解决方案1
271 已采纳 2013-11-08 13:10:41

解决方案2
8 2016-05-17 05:16:44