简体   繁体   English

遍历目录树的 Python 方法是什么?

[英]What is the Python way to walk a directory tree?

I feel that assigning files, and folders and doing the += [item] part is a bit hackish.我觉得分配文件和文件夹并执行 += [item] 部分有点骇人听闻。 Any suggestions?有什么建议么? I'm using Python 3.2我正在使用 Python 3.2

from os import *
from os.path import *

def dir_contents(path):
    contents = listdir(path)
    files = []
    folders = []
    for i, item in enumerate(contents):
        if isfile(contents[i]):
            files += [item]
        elif isdir(contents[i]):
            folders += [item]
    return files, folders

Take a look at the os.walk function which returns the path along with the directories and files it contains.看看os.walk function ,它返回路径以及它包含的目录和文件。 That should considerably shorten your solution.这应该会大大缩短您的解决方案。

os.walk and os.scandir are great options, however, I've been using pathlib more and more, and with pathlib you can use the .glob() method: os.walkos.scandir是不错的选择,但是,我越来越多地使用pathlib ,并且使用 pathlib,您可以使用.glob()方法:

root_directory = Path(".")
for path_object in root_directory.glob('**/*'):
    if path_object.is_file():
        print(f"hi, I'm a file: {path_object}")
    elif path_object.is_dir():
        print(f"hi, I'm a dir: {path_object}")


For anyone looking for a solution using pathlib ( python >= 3.4 )对于任何使用pathlib ( python >= 3.4 ) 寻找解决方案的人

from pathlib import Path

def walk(path): 
    for p in Path(path).iterdir(): 
        if p.is_dir(): 
            yield from walk(p)
            continue
        yield p.resolve()

# recursively traverse all files from current directory
for p in walk(Path('.')): 
    print(p)

# the function returns a generator so if you need a list you need to build one
all_files = list(walk(Path('.'))) 

However, as mentioned above, this does not preserve the top-down ordering given by os.walk但是,如上所述,这不会保留os.walk给出的自上而下的顺序

If you want to recursively iterate through all the files, including all files in the subfolders, I believe this is the best way.如果您想递归遍历所有文件,包括子文件夹中的所有文件,我相信这是最好的方法。

import os

def get_files(input):
    for fd, subfds, fns in os.walk(input):
       for fn in fns:
            yield os.path.join(fd, fn)

## now this will print all full paths

for fn in get_files(fd):
    print(fn)

Since Python 3.4 there is new module pathlib .由于 Python 3.4 有新模块pathlib So to get all dirs and files one can do:因此,要获取所有目录和文件,可以执行以下操作:

from pathlib import Path

dirs = [str(item) for item in Path(path).iterdir() if item.is_dir()]
files = [str(item) for item in Path(path).iterdir() if item.is_file()]

Indeed using确实使用

items += [item]

is bad for many reasons...不好有很多原因...

  1. The append method has been made exactly for that (appending one element to the end of a list) append方法正是为此而设计的(将一个元素附加到列表的末尾)

  2. You are creating a temporary list of one element just to throw it away.您正在创建一个元素的临时列表,只是为了将其丢弃。 While raw speed should not your first concern when using Python (otherwise you're using the wrong language) still wasting speed for no reason doesn't seem the right thing.虽然在使用 Python 时,原始速度不应该是您首先关心的问题(否则您使用的是错误的语言)仍然无缘无故地浪费速度似乎不是正确的事情。

  3. You are using a little asymmetry of the Python language... for list objects writing a += b is not the same as writing a = a + b because the former modifies the object in place, while the second instead allocates a new list and this can have a different semantic if the object a is also reachable using other ways.您正在使用 Python 语言的一点不对称性...对于列表对象,写入a += b与写入a = a + b不同,因为前者修改了 object,而后者则分配了一个新列表和如果 object a也可以使用其他方式访问,则这可能具有不同的语义。 In your specific code this doesn't seem the case but it could become a problem later when someone else (or yourself in a few years, that is the same) will have to modify the code.在您的特定代码中,情况似乎并非如此,但是当其他人(或几年后的您自己,同样如此)必须修改代码时,它可能会成为问题。 Python even has a method extend with a less subtle syntax that is specifically made to handle the case in which you want to modify in place a list object by adding at the end the elements of another list. Python 甚至有一个方法extend具有不太微妙的语法,专门用于处理您想要通过在末尾添加另一个列表的元素来修改列表 object 的情况。

Also as other have noted seems that your code is trying to do what os.walk already does...同样正如其他人所指出的那样,您的代码似乎正在尝试做os.walk已经做的事情......

def dir_contents(path):
    files,folders = [],[]
    for p in listdir(path):
        if isfile(p): files.append(p)
        else: folders.append(p)
    return files, folders

Since Python >= 3.4 the exists the generator method Path.rglob .由于Python >= 3.4存在生成器方法Path.rglob So, to process all paths under some/starting/path just do something such as因此,要处理some/starting/path下的所有路径,只需执行以下操作

from pathlib import Path

path = Path('some/starting/path') 
for subpath in path.rglob('*'):
    # do something with subpath

To get all subpaths in a list do list(path.rglob('*')) .要获取列表中的所有子路径,请执行list(path.rglob('*')) To get just the files with sql extension, do list(path.rglob('*.sql')) .要仅获取扩展名为sql的文件,请执行list(path.rglob('*.sql'))

Instead of the built-in os.walk and os.path.walk, I use something derived from this piece of code I found suggested elsewhere which I had originally linked to but have replaced with inlined source:我没有使用内置的 os.walk 和 os.path.walk,而是使用从我发现的其他地方建议的这段代码派生出来的东西,我最初链接到但已替换为内联源:

import os
import stat

class DirectoryStatWalker:
    # a forward iterator that traverses a directory tree, and
    # returns the filename and additional file information

    def __init__(self, directory):
        self.stack = [directory]
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack
                self.directory = self.stack.pop()
                self.files = os.listdir(self.directory)
                self.index = 0
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                st = os.stat(fullname)
                mode = st[stat.ST_MODE]
                if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
                    self.stack.append(fullname)
                return fullname, st

if __name__ == '__main__':
    for file, st in DirectoryStatWalker("/usr/include"):
        print file, st[stat.ST_SIZE]

It walks the directories recursively and is quite efficient and easy to read.它递归地遍历目录,非常高效且易于阅读。

While googling for the same info, I found this question.在谷歌搜索相同的信息时,我发现了这个问题。

I am posting here the smallest, clearest code which I found at http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/ (rather than just posting the URL, in case of link rot).我在这里发布了我在http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/找到的最小、最清晰的代码(而不是只需发布 URL,以防链接失效)。

The page has some useful info and also points to a few other relevant pages.该页面有一些有用的信息,还指向其他一些相关页面。

# Import the os module, for the os.walk function
import os

# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

I've not tested this extensively yet, but I believe this will expand the os.walk generator, join dirnames to all the file paths, and flatten the resulting list;我还没有对此进行广泛的测试,但我相信这将扩展os.walk生成器,将 dirnames 加入所有文件路径,并将结果列表展平; To give a straight up list of concrete files in your search path.在搜索路径中提供具体文件的直接列表。

import itertools
import os

def find(input_path):
    return itertools.chain(
        *list(
            list(os.path.join(dirname, fname) for fname in files)
            for dirname, _, files in os.walk(input_path)
        )
    )

Try using the append method.尝试使用append方法。

Another solution how to walk a directory tree using the pathlib module:如何使用pathlib模块遍历目录树的另一种解决方案:

from pathlib import Path

for directory in Path('.').glob('**'):
    for item in directory.iterdir():
        print(item)

The pattern ** matches current directory and all subdirectories, recursively, and the method iterdir then iterates over each directory's contents.模式**递归地匹配当前目录和所有子目录,然后方法iterdir遍历每个目录的内容。 Useful when you need more control when traversing the directory tree.当您在遍历目录树时需要更多控制时很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM