正则表达式和os.walk

Question

I'm pretty new to python but learning fast. 我是python的新手，但学习速度很快。 I'm trying to use regex with os.walk to ignore directories that I don't want processed. 我试图将regex与os.walk一起使用，以忽略我不想处理的目录。 I understand that you must modify the dirs in place and not create a new list. 我了解您必须就地修改目录，而不要创建新列表。 I have tried it both ways though. 我已经尝试了两种方式。 I don't get any errors but it still traverses all the directory's. 我没有任何错误，但它仍然遍历所有目录。 Exluding full dir names works fine. 排除全目录名可以正常工作。 I am trying to remove all directories with 'EXP' or '-' or '3.2' in the name. 我正在尝试删除名称中带有“ EXP”或“-”或“ 3.2”的所有目录。 Here is an example I want to ignore 3.2.2.150-20150424.195805_EXP_manuMain_outOfMemFix 这是一个我想忽略的示例3.2.2.150-20150424.195805_EXP_manuMain_outOfMemFix

This is what I have: 这就是我所拥有的：

def runtest(filepath_udu: object) -> object:   
    k = 1
    for root, dirs, files in os.walk(filepath_udu, topdown=True):
        dirs[:] = [item for item in dirs 
                   if item not in ('1node','local','remote')]
        dirs[:] = [dir for dir in dirs 
                   if re.search(r'\bEXP\b', dir) not in dirs \
                   or re.search(r'\b3.2\b', dir) not in dirs \
                   or re.search(r'\w+(?:- \w+)+', dir) not in dirs]
    for file in files:
        do something...

What am i doing wrong that my 2nd dirs[:] is being ignored? 我的第二个Dirs [：]被忽略了，我在做什么错？ Thanks 谢谢

Answer 1

Its not being ignored, it's just that your condition is always true, so you aren't filtering anything out. 它不会被忽略，只是您的条件始终为真，因此您不会过滤掉任何内容。

re.search is going to return a match object if something is found, or None if not. 如果发现某些内容， re.search将返回一个匹配对象，否则将返回None 。 Either way, that's not going to be an element of dirs , because dirs is just a list of strings. 无论哪种方式，都不会成为dirs的元素，因为dirs只是一个字符串列表。 So all of your tests are always true. 因此，您的所有测试始终都是正确的。

Instead of checking that the search is not in dirs , just check that it's not truthy. 不必检查搜索是否不在dirs ，而要检查它是否不真实。 (A match object is always truthy, and None is always falsey.) （匹配对象始终是真实的，而None始终是虚假的。）

Also, after you fix that, I'm pretty sure you wanted to keep the values where all of the tests fail—but you're using or instead of and , which means you're keeping the values where any of the tests fail. 另外，在解决此问题之后，我很确定您希望将所有测试失败的值保留在其中，但是您正在使用or代替and ，这意味着您将所有测试失败的值保留在其中。

So: 所以：

dirs[:] = [dir for dir in dirs
           if not re.search(r'\bEXP\b', dir)
           and not re.search(r'\b3.2\b', dir) 
           and not re.search(r'\w+(?:-\w+)+', dir)]

Or, if it's easier to understand the other way round—instead of keep all values where all the tests fail, keep all values where none of the tests is true: 或者，如果更容易理解（相反），而不是将所有值都保留在所有测试失败的地方，而是将所有值保留在所有测试都不成立的地方：

dirs[:] = [dir for dir in dirs if not (
           re.search(r'\bEXP\b', dir) or
           re.search(r'\b3.2\b', dir) or
           re.search(r'\w+(?:-\w+)+', dir))]

Answer 2

Instead of using os.walk , you can avoid the overhead of dealing with list manipulations by recursively traversing the sub-directories on your own with os.scandir after excluding those that match your exclusion criteria: 除了使用os.walk ，您还可以通过排除与您的排除条件匹配的os.scandir后使用os.scandir递归遍历子目录来避免处理列表操作的开销：

def runtest(filepath_udu: object) -> object:
    for entry in os.scandir(filepath_udu):
        if entry.is_dir() and entry.name not in ('1node', 'local', 'remote') and not re.search(r'\bEXP\b', entry.name) and not re.search(r'\b3.2\b', entry.name) and not re.search(r'\w+(?:- \w+)+', entry.name):
            runtest(entry.path)
        else:
            do something ...

正则表达式和os.walk

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-08-03 02:43:34

解决方案2
-1 2018-08-03 02:57:23

正则表达式和os.walk

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-08-03 02:43:34

解决方案2 -1 2018-08-03 02:57:23

解决方案1
0 已采纳 2018-08-03 02:43:34

解决方案2
-1 2018-08-03 02:57:23