简体   繁体   English

Python OS.WALK 删除目录

[英]Python OS.WALK Remove Directories

I'm trying to remove directories from os.walk (I don't need the files from those dirs)我正在尝试从 os.walk 中删除目录(我不需要这些目录中的文件)

My code:我的代码:

def findit(root, exclude_files=[], exclude_dirs=[]):
    exclude_files = (fnmatch.translate(i) for i in exclude_files)
    exclude_files = '('+')|('.join(exclude_files)+')'
    exclude_files = re.compile(exclude_files)
    exclude_dirs = (os.path.normpath(i) for i in exclude_dirs)
    exclude_dirs = (os.path.normcase(i) for i in exclude_dirs)
    exclude_dirs = set(exclude_dirs)
    return (os.path.join(r,f)
           for r,_,f in os.walk(root)
           if os.path.normpath(os.path.normcase(r)) not in exclude_dirs
           for f in f
           if not exclude_files.match(os.path.normcase(f)))

It works filtering the files, when I try to filter out c:/windows it will still show my files from windows dirs am I missing something?它可以过滤文件,当我尝试过滤掉 c:/windows 时,它仍然会从 windows 目录中显示我的文件,我是否遗漏了什么?

   filelist = list(findit('c:/',exclude_files = ['*.dll', '*.dat', '*.log', '*.exe'], exclude_dirs = ['c:/windows', 'c:/program files', 'c:/else']))

When filtering out directories, you are not preventing os.walk() from going into subdirectories.过滤目录时,您不会阻止os.walk()进入子目录。

You'll need to clear the dirs list for this to happen:您需要清除dirs列表才能发生这种情况:

def findit(root, exclude_files=[], exclude_dirs=[]):
    exclude_files = (fnmatch.translate(i) for i in exclude_files)
    exclude_files = '('+')|('.join(exclude_files)+')'
    exclude_files = re.compile(exclude_files)
    exclude_dirs = (os.path.normpath(i) for i in exclude_dirs)
    exclude_dirs = (os.path.normcase(i) for i in exclude_dirs)
    exclude_dirs = set(exclude_dirs)
    for current, dirs, files in os.walk(root):
        if os.path.normpath(os.path.normcase(current)) in exclude_dirs:
            # exclude this dir and subdirectories
            dirs[:] = []
            continue
        for f in files:
            if not exclude_files.match(os.path.normcase(f)):
                yield os.path.join(current, f)

The dirs[:] = [] assignment clears the list in place; dirs[:] = []赋值清除了列表; it removes all dirnames from the list.它从列表中删除所有目录名。 As this list is shared with os.walk() and the latter uses this list to subsequently visit sub-directories, this effectively stops os.walk() from visiting those subdirectories.由于此列表与os.walk()共享,后者使用此列表随后访问子目录,这有效地阻止了os.walk()访问这些子目录。

Reading the reply above made me wonder.看了上面的回复让我很疑惑。 Seemed to me the os.walk was missing and the root parameter did not seem to be used as needed.在我看来,os.walk 丢失了,并且根参数似乎没有根据需要使用。 Also, the case of either of the optional arguments being the empty list should work.此外,任何一个可选参数为空列表的情况都应该有效。 Suggesting a slight variation with less namespace look-up and exclude wildcards for directories at each directory level:建议使用较少的命名空间查找和排除每个目录级别的目录的通配符的轻微变化:

import os
import re
import fnmatch
import os.path


def findit(root, exclude_files=[], exclude_dirs=[], exclude_dirs_wc=[]):
    """Generate all files found under root excluding some.

    Excluded files are given as a list of Unix shell-style wildcards
    that exclude matches in each directory.  Excluded directories are
    assumed to be paths starting at root; no wildcards.  Directory
    wildcards at each level can be supplied.

    """
    # Less namespace look-up.
    join = os.path.join
    normpath = os.path.normpath; normcase = os.path.normcase
    #
    def make_exclude_regex_from(lst):
        if len(lst):
            lst = (fnmatch.translate(i) for i in lst)
            lst = "({})".format(")|(".join(lst))
            lst = re.compile(lst)
        return lst
    #
    exclude_files = make_exclude_regex_from(exclude_files)
    exclude_dirs_wc = make_exclude_regex_from(exclude_dirs_wc)
    if len(exclude_dirs):
        exclude_dirs = (normpath(i) for i in exclude_dirs)
        exclude_dirs = (normcase(i) for i in exclude_dirs)
        exclude_dirs = set(exclude_dirs)
    for current, dirs, files in os.walk(root):
        current_dir = normpath(normcase(current))
        if exclude_dirs and current_dir in exclude_dirs:
            # Prune set of dirs to exclude.
            exclude_dirs.discard(current_dir)
            # Disregard sub-directories.
            dirs[:] = []  # IN PLACE, since it is a loop var.
            continue
        if exclude_dirs_wc:
            for dd in dirs[:]:
                if exclude_dirs_wc.match(normcase(dd)):
                    dirs.remove(dd)  # IN PLACE
        if exclude_files:
            for ff in files[:]:
                if exclude_files.match(normcase(ff)):
                    files.remove(ff)  # IN PLACE; also a loop var.
        for f in files:
            yield join(current,f)

You can use the keyword "continue" to skip the iteration while traversing using os.walk("pathName")您可以在使用 os.walk("pathName") 遍历时使用关键字“continue”跳过迭代

for dirpath, dirnames, filenames in os.walk(pathName):
    # Write regular expression or a string to skip the desired folder
    dirpath_pat = re.search(pattern, dirpath)
    if dirpath_pat:
        if dirpath_pat.group(0):
            continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM