简体   繁体   English

如何列出目录的所有文件?

[英]How do I list all files of a directory?

How can I list all files of a directory in Python and add them to a list ?如何列出 Python 中目录的所有文件并将它们添加到list

os.listdir() will get you everything that's in a directory - files and directories . os.listdir()将为您提供目录中的所有内容 -文件目录

If you want just files, you could either filter this down using os.path :如果你想要文件,你可以使用os.path过滤掉它:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

or you could use os.walk() which will yield two lists for each directory it visits - splitting into files and dirs for you.或者您可以使用os.walk()它将为它访问的每个目录生成两个列表- 为您拆分为文件目录 If you only want the top directory you can break the first time it yields如果你只想要顶级目录,你可以在它第一次产生时打破

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

or, shorter:或者,更短:

from os import walk

filenames = next(walk(mypath), (None, None, []))[2]  # [] if no file

I prefer using the glob module, as it does pattern matching and expansion.我更喜欢使用glob模块,因为它进行模式匹配和扩展。

import glob
print(glob.glob("/home/adam/*"))

It does pattern matching intuitively它直观地进行模式匹配

import glob
# All files ending with .txt
print(glob.glob("/home/adam/*.txt")) 
# All files ending with .txt with depth of 2 folder
print(glob.glob("/home/adam/*/*.txt")) 

It will return a list with the queried files:它将返回一个包含查询文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

os.listdir() - list in the current directory os.listdir() - 当前目录中的列表

With listdir in os module you get the files and the folders in the current dir使用 os 模块中的 listdir,您可以获得当前目录中的文件和文件夹

 import os
 arr = os.listdir()
 print(arr)
 
 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Looking in a directory在目录中查找

arr = os.listdir('c:\\files')

glob from glob glob从水珠

with glob you can specify a type of file to list like this使用 glob,您可以指定要列出的文件类型,如下所示

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob in a list comprehension列表理解中的glob

mylist = [f for f in glob.glob("*.txt")]

get the full path of only files in the current directory仅获取当前目录中文件的完整路径

import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if 
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles) 

['G:\\getfilesname\\getfilesname.py', 'G:\\getfilesname\\example.txt']

Getting the full path name with os.path.abspath使用os.path.abspath获取完整路径名

You get the full path in return你得到完整的路径作为回报

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)
 
 ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Walk: going through sub directories Walk:遍历子目录

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; os.walk 返回根目录、目录列表和文件列表,这就是为什么我在 for 循环中将它们解压到 r、d、f 中的原因; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.然后,它会在根目录的子文件夹中查找其他文件和目录,依此类推,直到没有子文件夹为止。

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir() : get files in the current directory (Python 2) os.listdir() :获取当前目录中的文件(Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as '.'在 Python 2 中,如果您想要当前目录中的文件列表,您必须将参数指定为 '.' or os.getcwd() in the os.listdir method.或 os.listdir 方法中的 os.getcwd()。

 import os
 arr = os.listdir('.')
 print(arr)
 
 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

To go up in the directory tree进入目录树

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

Get files: os.listdir() in a particular directory (Python 2 and 3)获取文件:特定目录中的os.listdir() (Python 2 和 3)

 import os
 arr = os.listdir('F:\\python')
 print(arr)
 
 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Get files of a particular subdirectory with os.listdir()使用os.listdir()获取特定子目录的文件

import os

x = os.listdir("./content")

os.walk('.') - current directory os.walk('.') - 当前目录

 import os
 arr = next(os.walk('.'))[2]
 print(arr)
 
 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.')) and os.path.join('dir', 'file') next(os.walk('.'))os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next(os.walk('F:\\\\') - get the full path - list comprehension next(os.walk('F:\\\\') - 获取完整路径 - 列表理解

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
 
 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk - get full path - all files in sub dirs** os.walk - 获取完整路径 - 子目录中的所有文件**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() - get only txt files os.listdir() - 只获取 txt 文件

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)
 
 >>> ['work.txt', '3ebooks.txt']

Using glob to get the full path of the files使用glob获取文件的完整路径

If I should need the absolute path of the files:如果我需要文件的绝对路径:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list使用os.path.isfile避免列表中的目录

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

Using pathlib from Python 3.4使用 Python 3.4 中的pathlib

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension :使用list comprehension

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")或者,使用pathlib.Path()而不是pathlib.Path(".")

Use glob method in pathlib.Path()在 pathlib.Path() 中使用 glob 方法

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk使用 os.walk 获取所有且唯一的文件

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

Get only files with next and walk in a directory仅使用 next 获取文件并进入目录

 import os
 x = next(os.walk('F://python'))[2]
 print(x)
 
 >>> ['calculator.bat','calculator.py']

Get only directories with next and walk in a directory仅使用 next 获取目录并进入目录

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')
 
 >>> ['python3','others']

Get all the subdir names with walk使用walk获取所有子目录名称

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater os.scandir()来自 Python 3.5 及更高版本

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:例子:

Ex.前任。 1: How many files are there in the subdirectories? 1:子目录下有多少个文件?

In this example, we look for the number of files that are included in all the directory and its subdirectories.在此示例中,我们查找包含在所有目录及其子目录中的文件数。

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

Ex.2: How to copy all files from a directory to another?例 2:如何将所有文件从一个目录复制到另一个目录?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.一个脚本,用于在您的计算机中查找某种类型的所有文件(默认:pptx)并将它们复制到一个新文件夹中。

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex.前任。 3: How to get all the files in a txt file 3:如何获取一个txt文件中的所有文件

In case you want to create a txt file with all the file names:如果您想创建一个包含所有文件名的 txt 文件:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

Example: txt with all the files of an hard drive示例:包含硬盘驱动器所有文件的txt

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C:\\ in one text file C:\\ 的所有文件在一个文本文件中

This is a shorter version of the previous code.这是先前代码的较短版本。 Change the folder where to start finding the files if you need to start from another position.如果您需要从其他位置开始,请更改开始查找文件的文件夹。 This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.此代码在我的计算机上生成一个 50 mb 的文本文件,其中包含完整路径的文件少于 500.000 行。

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

How to write a file with all paths in a folder of a type如何在类型的文件夹中写入包含所有路径的文件

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type.使用此功能,您可以创建一个 txt 文件,该文件将具有您要查找的文件类型的名称(例如 pngfile.txt)以及该类型所有文件的所有完整路径。 It can be useful sometimes, I think.我想,它有时很有用。

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

(New) Find all files and open them with tkinter GUI (新)查找所有文件并使用 tkinter GUI 打开它们

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.我只是想在这个 2019 年添加一个小应用程序来搜索目录中的所有文件,并能够通过双击列表中的文件名来打开它们。 在此处输入图片说明

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
import os
os.listdir("somedirectory")

will return a list of all files and directories in "somedirectory".将返回“somedirectory”中所有文件和目录的列表。

A one-line solution to get only list of files (no subdirectories):获取文件列表(无子目录)的单行解决方案:

filenames = next(os.walk(path))[2]

or absolute pathnames:或绝对路径名:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

Getting Full File Paths From a Directory and All Its Subdirectories从目录及其所有子目录获取完整文件路径

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called "SUBFOLDER."我在上述函数中提供的路径包含 3 个文件——其中两个在根目录中,另一个在名为“SUBFOLDER”的子文件夹中。 You can now do things like:您现在可以执行以下操作:
  • print full_file_paths which will print the list: print full_file_paths将打印列表:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

If you'd like, you can open and read the contents, or focus only on files with the extension ".dat" like in the code below:如果您愿意,您可以打开并阅读内容,或仅关注扩展名为“.dat”的文件,如下面的代码所示:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir() :从 3.4 版开始,就有了比os.listdir()更高效的内置迭代器

pathlib : New in version 3.4. pathlib3.4 版中的新功能。

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428 , the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.根据PEP 428pathlib库的目的是提供一个简单的类层次结构来处理文件系统路径和用户对它们进行的常见操作。

os.scandir() : New in version 3.5. os.scandir()3.5 版中的新功能。

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() uses os.scandir() instead of os.listdir() from version 3.5, and its speed got increased by 2-20 times according to PEP 471 .请注意, os.walk()使用os.scandir()而不是 3.5 版中的os.listdir() ,根据PEP 471 ,其速度提高了 2-20 倍。

Let me also recommend reading ShadowRanger's comment below.让我也推荐阅读下面的 ShadowRanger 评论。

Preliminary notes初步说明

  • Although there's a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files尽管问题文本中的文件目录术语有明显的区别,但有些人可能会争辩说目录实际上是特殊文件
  • The statement: " all files of a directory " can be interpreted in two ways:声明:“一个目录的所有文件”可以有两种解释:
    1. All direct (or level 1) descendants only仅限所有直接(或 1 级)后代
    2. All descendants in the whole directory tree (including the ones in sub-directories)整个目录树中的所有后代(包括子目录中的)
  • When the question was asked, I imagine that Python 2 , was the LTS version, however the code samples will be run by Python 3 ( .5 ) (I'll keep them as Python 2 compliant as possible; also, any code belonging to Python that I'm going to post, is from v3.5.4 - unless otherwise specified).当被问到这个问题时,我想Python 2LTS版本,但是代码示例将由Python 3 ( .5 ) 运行(我将尽可能使它们与Python 2兼容;此外,任何属于我要发布的Python来自v3.5.4 - 除非另有说明)。 That has consequences related to another keyword in the question: " add them into a list ":这会产生与问题中的另一个关键字相关的后果:“将它们添加到列表中”:

    • In pre Python 2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, ...)Python 2.2 之前的版本中,序列(可迭代对象)主要由列表(元组、集合等)表示
    • In Python 2.2 , the concept of generator ( [Python.Wiki]: Generators ) - courtesy of [Python 3]: The yield statement ) - was introduced.Python 2.2 中引入生成器的概念( [Python.Wiki]: Generators ) - 由[Python 3]: The yield statement 提供) - 被引入。 As time passed, generator counterparts started to appear for functions that returned/worked with lists随着时间的推移,生成器对应物开始出现在返回/处理列表的函数中
    • In Python 3 , generator is the default behaviorPython 3 中,生成器是默认行为
    • Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it).不确定返回列表是否仍然是强制性的(或者生成器也可以),但是将生成器传递给列表构造函数,会从中创建一个列表(并使用它)。 The example below illustrates the differences on [Python 3]: map ( function, iterable, ... )下面的示例说明了[Python 3]上的差异map ( function, iterable, ... )
     >>> import sys >>> sys.version '2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function >>> m, type(m) ([1, 2, 3], <type 'list'>) >>> len(m) 3


     >>> import sys >>> sys.version '3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) >>> m, type(m) (<map object at 0x000001B4257342B0>, <class 'map'>) >>> len(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'map' has no len() >>> lm0 = list(m) # Build a list from the generator >>> lm0, type(lm0) ([1, 2, 3], <class 'list'>) >>> >>> lm1 = list(m) # Build a list from the same generator >>> lm1, type(lm1) # Empty list now - generator already consumed ([], <class 'list'>)
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win , but I'm using the same tree on Lnx as well):这些示例将基于名为root_dir的目录,具有以下结构(此示例适用于Win ,但我也在Lnx上使用相同的树):

     E:\\Work\\Dev\\StackOverflow\\q003207219>tree /f "root_dir" Folder PATH listing for volume Work Volume serial number is 00000029 3655:6FED E:\\WORK\\DEV\\STACKOVERFLOW\\Q003207219\\ROOT_DIR ¦ file0 ¦ file1 ¦ +---dir0 ¦ +---dir00 ¦ ¦ ¦ file000 ¦ ¦ ¦ ¦ ¦ +---dir000 ¦ ¦ file0000 ¦ ¦ ¦ +---dir01 ¦ ¦ file010 ¦ ¦ file011 ¦ ¦ ¦ +---dir02 ¦ +---dir020 ¦ +---dir0200 +---dir1 ¦ file10 ¦ file11 ¦ file12 ¦ +---dir2 ¦ ¦ file20 ¦ ¦ ¦ +---dir20 ¦ file200 ¦ +---dir3


Solutions解决方案

Programmatic approaches:程序化方法:

  1. [Python 3]: os. [Python 3]:操作系统。listdir ( path='.' )listdir ( path='.' )

    Return a list containing the names of the entries in the directory given by path.返回一个包含路径给定目录中条目名称的列表。 The list is in arbitrary order, and does not include the special entries '.'该列表按任意顺序排列,不包括特殊条目'.' and '..' ...'..' ...


     >>> import os >>> root_dir = "root_dir" # Path relative to current dir (os.getcwd()) >>> >>> os.listdir(root_dir) # List all the items in root_dir ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories) ['file0', 'file1']

    A more elaborate example ( code_os_listdir.py ):一个更详细的例子( code_os_listdir.py ):

     import os from pprint import pformat def _get_dir_content(path, include_folders, recursive): entries = os.listdir(path) for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: yield entry_with_path if recursive: for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive): yield sub_entry else: yield entry_with_path def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) for item in _get_dir_content(path, include_folders, recursive): yield item if prepend_folder_name else item[path_len:] def _get_dir_content_old(path, include_folders, recursive): entries = os.listdir(path) ret = list() for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: ret.append(entry_with_path) if recursive: ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive)) else: ret.append(entry_with_path) return ret def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)] def main(): root_dir = "root_dir" ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True) lret0 = list(ret0) print(ret0, len(lret0), pformat(lret0)) ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False) print(len(ret1), pformat(ret1)) if __name__ == "__main__": main()

    Notes :注意事项

    • There are two implementations:有两种实现:
      • One that uses generators (of course here it seems useless, since I immediately convert the result to a list)使用生成器的一个(当然在这里它似乎没用,因为我立即将结果转换为列表)
      • The classic one (function names ending in _old )经典的(以_old结尾的函数名)
    • Recursion is used (to get into subdirectories)使用递归(进入子目录)
    • For each implementation there are two functions:对于每个实现,有两个功能:
      • One that starts with an underscore ( _ ): "private" (should not be called directly) - that does all the work下划线( _ )开头的:“private”(不应直接调用)- 完成所有工作
      • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries.公共的(包装在前一个):它只是从返回的条目中剥离初始路径(如果需要)。 It's an ugly implementation, but it's the only idea that I could come with at this point这是一个丑陋的实现,但这是我目前唯一能想到的想法
    • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn't test them in recursive functions, and also I am iterating inside the function over inner generators - don't know how performance friendly is that在性能方面,生成器通常要快一点(考虑到创建迭代时间),但我没有在递归函数中测试它们,而且我在内部生成器的内部迭代 - 不知道性能如何友好的是
    • Play with the arguments to get different results玩弄参数以获得不同的结果


    Output :输出

     (py35x64_test) E:\\Work\\Dev\\StackOverflow\\q003207219>"e:\\Work\\Dev\\VEnvs\\py35x64_test\\Scripts\\python.exe" "code_os_listdir.py" <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\\\dir0', 'root_dir\\\\dir0\\\\dir00', 'root_dir\\\\dir0\\\\dir00\\\\dir000', 'root_dir\\\\dir0\\\\dir00\\\\dir000\\\\file0000', 'root_dir\\\\dir0\\\\dir00\\\\file000', 'root_dir\\\\dir0\\\\dir01', 'root_dir\\\\dir0\\\\dir01\\\\file010', 'root_dir\\\\dir0\\\\dir01\\\\file011', 'root_dir\\\\dir0\\\\dir02', 'root_dir\\\\dir0\\\\dir02\\\\dir020', 'root_dir\\\\dir0\\\\dir02\\\\dir020\\\\dir0200', 'root_dir\\\\dir1', 'root_dir\\\\dir1\\\\file10', 'root_dir\\\\dir1\\\\file11', 'root_dir\\\\dir1\\\\file12', 'root_dir\\\\dir2', 'root_dir\\\\dir2\\\\dir20', 'root_dir\\\\dir2\\\\dir20\\\\file200', 'root_dir\\\\dir2\\\\file20', 'root_dir\\\\dir3', 'root_dir\\\\file0', 'root_dir\\\\file1'] 11 ['dir0\\\\dir00\\\\dir000\\\\file0000', 'dir0\\\\dir00\\\\file000', 'dir0\\\\dir01\\\\file010', 'dir0\\\\dir01\\\\file011', 'dir1\\\\file10', 'dir1\\\\file11', 'dir1\\\\file12', 'dir2\\\\dir20\\\\file200', 'dir2\\\\file20', 'file0', 'file1']


  1. [Python 3]: os. [Python 3]:操作系统。scandir ( path='.' ) ( Python 3.5 +, backport: [PyPI]: scandir )scandir ( path='.' ) ( Python 3.5 +, backport: [PyPI]: scandir )

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path .返回与path给出的目录中的条目对应的os.DirEntry对象的迭代器。 The entries are yielded in arbitrary order, and the special entries '.'条目以任意顺序生成,特殊条目'.' and '..' are not included.'..'不包括在内。

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory.使用scandir()而不是listdir()可以显着提高还需要文件类型或文件属性信息的代码的性能,因为如果操作系统在扫描目录时提供了os.DirEntry对象,则会公开此信息。 All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links;所有os.DirEntry方法都可能执行系统调用,但is_dir()is_file()通常只需要对符号链接进行系统调用; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows. os.DirEntry.stat()在 Unix 上总是需要一个系统调用,但在 Windows 上只需要一个符号链接。


     >>> import os >>> root_dir = os.path.join(".", "root_dir") # Explicitly prepending current directory >>> root_dir '.\\\\root_dir' >>> >>> scandir_iterator = os.scandir(root_dir) >>> scandir_iterator <nt.ScandirIterator object at 0x00000268CF4BC140> >>> [item.path for item in scandir_iterator] ['.\\\\root_dir\\\\dir0', '.\\\\root_dir\\\\dir1', '.\\\\root_dir\\\\dir2', '.\\\\root_dir\\\\dir3', '.\\\\root_dir\\\\file0', '.\\\\root_dir\\\\file1'] >>> >>> [item.path for item in scandir_iterator] # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension) [] >>> >>> scandir_iterator = os.scandir(root_dir) # Reinitialize the generator >>> for item in scandir_iterator : ... if os.path.isfile(item.path): ... print(item.name) ... file0 file1

    Notes :注意事项

    • It's similar to os.listdir它类似于os.listdir
    • But it's also more flexible (and offers more functionality), more Python ic (and in some cases, faster)但它也更灵活(并提供更多功能),更多Python ic(在某些情况下,速度更快)


  1. [Python 3]: os. [Python 3]:操作系统。 walk ( top, topdown=True, onerror=None, followlinks=False ) walk ( top, topdown=True, onerror=None, followlinks=False )

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up.通过自顶向下或自底向上遍历树来生成目录树中的文件名。 For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple ( dirpath , dirnames , filenames ).对于以目录top 为根的树中的每个目录(包括top本身),它产生一个 3 元组( dirpath , dirnames , filenames )。


     >>> import os >>> root_dir = os.path.join(os.getcwd(), "root_dir") # Specify the full path >>> root_dir 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir' >>> >>> walk_generator = os.walk(root_dir) >>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument) >>> root_dir_entry ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1']) >>> >>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path ['E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0', 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir1', 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir2', 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir3', 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\file0', 'E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\file1'] >>> >>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir) ... print(entry) ... ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0', ['dir00', 'dir01', 'dir02'], []) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir00', ['dir000'], ['file000']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir00\\\\dir000', [], ['file0000']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir01', [], ['file010', 'file011']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir02', ['dir020'], []) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir02\\\\dir020', ['dir0200'], []) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir0\\\\dir02\\\\dir020\\\\dir0200', [], []) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir1', [], ['file10', 'file11', 'file12']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir2', ['dir20'], ['file20']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir2\\\\dir20', [], ['file200']) ('E:\\\\Work\\\\Dev\\\\StackOverflow\\\\q003207219\\\\root_dir\\\\dir3', [], [])

    Notes :注意事项

    • Under the scenes, it uses os.scandir ( os.listdir on older versions)在幕后,它使用os.scandir (旧版本上的os.listdir
    • It does the heavy lifting by recurring in subfolders它通过在子文件夹中重复出现来完成繁重的工作


  1. [Python 3]: glob. [Python 3]:glob。 glob ( pathname, *, recursive=False ) ( [Python 3]: glob. iglob ( pathname, *, recursive=False ) ) glob ( pathname, *, recursive=False ) ( [Python 3]: glob.iglob ( pathname, *, recursive=False ) )

    Return a possibly-empty list of path names that match pathname , which must be a string containing a path specification.返回与pathname匹配的可能为空的路径名列表,它必须是包含路径规范的字符串。 pathname can be either absolute (like /usr/src/Python-1.5/Makefile ) or relative (like ../../Tools/*/*.gif ), and can contain shell-style wildcards.路径名可以是绝对的(如/usr/src/Python-1.5/Makefile )或相对的(如../../Tools/*/*.gif ),并且可以包含 shell 样式的通配符。 Broken symlinks are included in the results (as in the shell).结果中包含损坏的符号链接(如在 shell 中)。
    ... ...
    Changed in version 3.5 : Support for recursive globs using “ ** ”.在 3.5 版更改: 支持使用“ ** ”的递归全局。


     >>> import glob, os >>> wildcard_pattern = "*" >>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name >>> root_dir 'root_dir\\\\*' >>> >>> glob_list = glob.glob(root_dir) >>> glob_list ['root_dir\\\\dir0', 'root_dir\\\\dir1', 'root_dir\\\\dir2', 'root_dir\\\\dir3', 'root_dir\\\\file0', 'root_dir\\\\file1'] >>> >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list] # Strip the dir name and the path separator from begining ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> for entry in glob.iglob(root_dir + "*", recursive=True): ... print(entry) ... root_dir\\ root_dir\\dir0 root_dir\\dir0\\dir00 root_dir\\dir0\\dir00\\dir000 root_dir\\dir0\\dir00\\dir000\\file0000 root_dir\\dir0\\dir00\\file000 root_dir\\dir0\\dir01 root_dir\\dir0\\dir01\\file010 root_dir\\dir0\\dir01\\file011 root_dir\\dir0\\dir02 root_dir\\dir0\\dir02\\dir020 root_dir\\dir0\\dir02\\dir020\\dir0200 root_dir\\dir1 root_dir\\dir1\\file10 root_dir\\dir1\\file11 root_dir\\dir1\\file12 root_dir\\dir2 root_dir\\dir2\\dir20 root_dir\\dir2\\dir20\\file200 root_dir\\dir2\\file20 root_dir\\dir3 root_dir\\file0 root_dir\\file1

    Notes :注意事项

    • Uses os.listdir使用os.listdir
    • For large trees (especially if recursive is on), iglob is preferred对于大树(特别是如果递归打开), iglob是首选
    • Allows advanced filtering based on name (due to the wildcard)允许基于名称的高级过滤(由于通配符)


  1. [Python 3]: class pathlib. [Python 3]:类路径库。 Path ( *pathsegments ) ( Python 3.4 +, backport: [PyPI]: pathlib2 ) 路径( *pathsegments ) ( Python 3.4 +, backport: [PyPI]: pathlib2 )

     >>> import pathlib >>> root_dir = "root_dir" >>> root_dir_instance = pathlib.Path(root_dir) >>> root_dir_instance WindowsPath('root_dir') >>> root_dir_instance.name 'root_dir' >>> root_dir_instance.is_dir() True >>> >>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only ['root_dir\\\\file0', 'root_dir\\\\file1']

    Notes :注意事项

    • This is one way of achieving our goal这是实现我们目标的一种方式
    • It's the OOP style of handling paths这是处理路径的OOP风格
    • Offers lots of functionalities提供许多功能


  1. [Python 2]: dircache.listdir(path) ( Python 2 only) [Python 2]:dircache.listdir(path) (仅限Python 2


    def listdir(path): """List directory contents, using cache.""" try: cached_mtime, list = cache[path] del cache[path] except KeyError: cached_mtime, list = -1, [] mtime = os.stat(path).st_mtime if mtime != cached_mtime: list = os.listdir(path) list.sort() cache[path] = mtime, list return list


  1. [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3) via [Python 3]: ctypes - A foreign function library for Python ( POSIX specific) [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3) via [Python 3]: ctypes - Python 的外部函数库(特定POSIX

    ctypes is a foreign function library for Python. ctypes是 Python 的外部函数库。 It provides C compatible data types, and allows calling functions in DLLs or shared libraries.它提供与 C 兼容的数据类型,并允许调用 DLL 或共享库中的函数。 It can be used to wrap these libraries in pure Python.它可用于将这些库包装在纯 Python 中。

    code_ctypes.py : code_ctypes.py :

     #!/usr/bin/env python3 import sys from ctypes import Structure, \\ c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \\ CDLL, POINTER, \\ create_string_buffer, get_errno, set_errno, cast DT_DIR = 4 DT_REG = 8 char256 = c_char * 256 class LinuxDirent64(Structure): _fields_ = [ ("d_ino", c_ulonglong), ("d_off", c_longlong), ("d_reclen", c_ushort), ("d_type", c_ubyte), ("d_name", char256), ] LinuxDirent64Ptr = POINTER(LinuxDirent64) libc_dll = this_process = CDLL(None, use_errno=True) # ALWAYS set argtypes and restype for functions, otherwise it's UB!!! opendir = libc_dll.opendir readdir = libc_dll.readdir closedir = libc_dll.closedir def get_dir_content(path): ret = [path, list(), list()] dir_stream = opendir(create_string_buffer(path.encode())) if (dir_stream == 0): print("opendir returned NULL (errno: {:d})".format(get_errno())) return ret set_errno(0) dirent_addr = readdir(dir_stream) while dirent_addr: dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr) dirent = dirent_ptr.contents name = dirent.d_name.decode() if dirent.d_type & DT_DIR: if name not in (".", ".."): ret[1].append(name) elif dirent.d_type & DT_REG: ret[2].append(name) dirent_addr = readdir(dir_stream) if get_errno(): print("readdir returned NULL (errno: {:d})".format(get_errno())) closedir(dir_stream) return ret def main(): print("{:s} on {:s}\\n".format(sys.version, sys.platform)) root_dir = "root_dir" entries = get_dir_content(root_dir) print(entries) if __name__ == "__main__": main()

    Notes :注意事项

    • It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer) - last notes from item #4. ).它从libc加载三个函数(在当前进程中加载​​)并调用它们(有关更多详细信息,请检查[SO]:如何检查文件是否无异常存在?(@CristiFati 的回答) -第 4项的最后说明 )。 That would place this approach very close to the Python / C edge这将使这种方法非常接近Python / C边缘
    • LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 ( 4.10.0-40-generic and libc6-dev:amd64 ). LinuxDirent64是从结构dirent64ctypes的表示[man7]:dirent.h(0P)从我的机器(等等都是DT_常数):Ubtu 16 64(4.10.0-40泛型libc6的-dev的:AMD64)。 On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior在其他风格/版本上,结构体定义可能不同,如果是这样,则应更新ctypes别名,否则将产生未定义行为
    • It returns data in the os.walk 's format.它以os.walk的格式返回数据。 I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task我没有费心让它递归,但从现有代码开始,这将是一项相当微不足道的任务
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ一切都在Win上也是可行的,数据(库、函数、结构、常量等)不同


    Output :输出

     [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609] on linux ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]


  1. [ActiveState.Docs]: win32file.FindFilesW ( Win specific) [ActiveState.Docs]: win32file.FindFilesWWin专用)

    Retrieves a list of matching filenames, using the Windows Unicode API.使用 Windows Unicode API 检索匹配文件名的列表。 An interface to the API FindFirstFileW/FindNextFileW/Find close functions. API FindFirstFileW/FindNextFileW/Find 关闭函数的接口。


     >>> import os, win32file, win32con >>> root_dir = "root_dir" >>> wildcard = "*" >>> root_dir_wildcard = os.path.join(root_dir, wildcard) >>> entry_list = win32file.FindFilesW(root_dir_wildcard) >>> len(entry_list) # Don't display the whole content as it's too long 8 >>> [entry[-2] for entry in entry_list] # Only display the entry names ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")] # Filter entries and only display dir names (except self and parent) ['dir0', 'dir1', 'dir2', 'dir3'] >>> >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)] # Only display file "full" names ['root_dir\\\\file0', 'root_dir\\\\file1']

    Notes :注意事项


  1. Install some (other) third-party package that does the trick安装一些(其他)第三方包来解决这个问题
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)最有可能的,将依赖于上述一项(或多项)(可能有轻微的定制)


Notes :注意事项

  • Code is meant to be portable (except places that target a specific area - which are marked) or cross:代码旨在是可移植的(除了针对特定区域的地方 - 已标记)或交叉:

    • platform ( Nix , Win , )平台 ( Nix , Win , )
    • Python version (2, 3, ) Python版本 (2, 3, )
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction在上述变体中使用了多种路径样式(绝对、相对),以说明所使用的“工具”在这个方向上是灵活的

  • os.listdir and os.scandir use opendir / readdir / closedir ( [MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function ) (via [GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c ) os.listdiros.scandir使用opendir / readdir / closedir ( [MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function ) (via [GitHub]: python/cpython - (主) cpython/Modules/posixmodule.c )

  • win32file.FindFilesW uses those ( Win specific) functions as well (via [GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i ) win32file.FindFilesW使用这些( Win特定的)函数(通过[GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i

  • _get_dir_content (from point #1. ) can be implemented using any of these approaches (some will require more work and some less) _get_dir_content (从第 1点开始)可以使用这些方法中的任何一种来实现(有些需要更多的工作,有些需要更少的工作)

    • Some advanced filtering (instead of just file vs. dir) could be done: eg the include_folders argument could be replaced by another one (eg filter_func ) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn't strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute可以完成一些高级过滤(而不仅仅是文件目录):例如,可以将include_folders参数替换为另一个参数(例如filter_func ),这将是一个将路径作为参数的函数: filter_func=lambda x: True (这不会删除任何内容)和内部_get_dir_content类似: if not filter_func(entry_with_path): continue (如果函数在一个条目中失败,它将被跳过),但是代码变得越复杂,花费的时间就越长执行
  • Nota bene!请注意! Since recursion is used, I must mention that I did some tests on my laptop ( Win 10 x64 ), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range ( recursionlimit - 1000 (default)), I got StackOverflow :).由于使用了递归,我必须提到我在我的笔记本电脑 ( Win 10 x64 ) 上做了一些测试,与这个问题完全无关,并且当递归级别达到(990 .. 1000)范围内的某个值时 ( recursionlimit - 1000 (默认)),我得到了StackOverflow :)。 If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem.如果目录树超过了这个限制(我不是FS专家,所以我不知道这是否可能),那可能是一个问题。
    I must also mention that I didn't try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)我还必须提到,我没有尝试增加recursionlimit,因为我在该领域没有经验(在必须增加OS级别的堆栈之前我可以增加多少),​​但理论上总是有可能失败,如果目录深度大于可能的最高递归限制(在该机器上)

  • The code samples are for demonstrative purposes only.代码示例仅用于演示目的。 That means that I didn't take into account error handling (I don't think there's any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible).这意味着我没有考虑到错误处理(我认为没有任何try / except / else / finally块),所以代码并不健壮(原因是:尽可能保持简单和简短)。 For production , error handling should be added as well对于生产,还应添加错误处理

Other approaches:其他方法:

  1. Use Python only as a wrapper仅将Python用作包装器

    • Everything is done using another technology一切都是使用另一种技术完成的
    • That technology is invoked from Python该技术是从Python调用的
    • The most famous flavor that I know is what I call the system administrator approach:我所知道的最著名的风格是我所说的系统管理员方法:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)使用Python (或任何与此相关的编程语言)来执行shell命令(并解析它们的输出)
      • Some consider this a neat hack有些人认为这是一个巧妙的黑客
      • I consider it more like a lame workaround ( gainarie ), as the action per se is performed from shell ( cmd in this case), and thus doesn't have anything to do with Python .我认为它更像是一个蹩脚的解决方法( Gainarie ),因为操作本身是从shell (在这种情况下为cmd )执行的,因此与Python没有任何关系。
      • Filtering ( grep / findstr ) or output formatting could be done on both sides, but I'm not going to insist on it.过滤( grep / findstr )或输出格式可以在双方完成,但我不会坚持这样做。 Also, I deliberately used os.system instead of subprocess.Popen .另外,我刻意用os.system代替subprocess.Popen
       (py35x64_test) E:\\Work\\Dev\\StackOverflow\\q003207219>"e:\\Work\\Dev\\VEnvs\\py35x64_test\\Scripts\\python.exe" -c "import os;os.system(\\"dir /b root_dir\\")" dir0 dir1 dir2 dir3 file0 file1

    In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well;通常应避免这种方法,因为如果某些命令输出格式在操作系统版本/风格之间略有不同,则解析代码也应进行调整; not to mention differences between locales).更不用说语言环境之间的差异了)。

I really liked adamk's answer , suggesting that you use glob() , from the module of the same name.我真的很喜欢adamk 的回答,建议您使用同名模块中的glob() This allows you to have pattern matching with * s.这允许您使用*进行模式匹配。

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions.但正如其他人在评论中指出的那样, glob()可能会因不一致的斜线方向而被绊倒。 To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.为了帮助解决这个问题,我建议您使用os.path模块中的join()expanduser()函数,也可能使用os模块中的getcwd()函数。

As examples:例如:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible - the path has been hardcoded and will only ever work on Windows between the drive name and the \\ s being hardcoded into the path.上面的内容很糟糕 - 路径已被硬编码,并且只能在驱动器名称和被硬编码到路径中的\\ s 之间的 Windows 上工作。

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs.上面的方法效果更好,但它依赖于文件夹名称Users ,该名称在 Windows 上经常出现,而在其他操作系统上并不常见。 It also relies on the user having a specific name, admin .它还依赖于具有特定名称admin的用户。

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.这在所有平台上都可以完美运行。

Another great example that works perfectly across platforms and does something a bit different:另一个很好的例子,它可以完美地跨平台工作并且做一些不同的事情:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.希望这些示例可以帮助您了解可以在标准 Python 库模块中找到的一些函数的强大功能。

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

If you are looking for a Python implementation of find , this is a recipe I use rather frequently:如果您正在寻找find的 Python 实现,这是我经常使用的一个秘诀:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository .所以我用它制作了一个 PyPI,还有一个GitHub 存储库 I hope that someone finds it potentially useful for this code.我希望有人发现它可能对这段代码有用。

For greater results, you can use listdir() method of the os module along with a generator (a generator is a powerful iterator that keeps its state, remember?).为了获得更好的结果,您可以将os模块的listdir()方法与生成器一起使用(生成器是一个强大的迭代器,可以保持其状态,还记得吗?)。 The following code works fine with both versions: Python 2 and Python 3.以下代码适用于两个版本:Python 2 和 Python 3。

Here's a code:这是一个代码:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

The listdir() method returns the list of entries for the given directory. listdir()方法返回给定目录的条目列表。 The method os.path.isfile() returns True if the given entry is a file.如果给定的条目是文件,则方法os.path.isfile()返回True And the yield operator quits the func but keeps its current state, and it returns only the name of the entry detected as a file.并且yield运算符退出 func 但保持其当前状态,并且它仅返回检测为文件的条目的名称。 All the above allows us to loop over the generator function.以上所有内容都允许我们循环生成器函数。

返回绝对文件路径列表,不会递归到子目录

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

A wise teacher told me once that:一位睿智的老师曾告诉我:

When there are several established ways to do something, none of them is good for all cases.当有几种既定的方法可以做某事时,没有一种方法适用于所有情况。

I will thus add a solution for a subset of the problem: quite often, we only want to check whether a file matches a start string and an end string, without going into subdirectories.因此,我将为问题的一个子集添加一个解决方案:通常,我们只想检查文件是否与开始字符串和结束字符串匹配,而无需进入子目录。 We would thus like a function that returns a list of filenames, like:因此,我们想要一个返回文件名列表的函数,例如:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

If you care to first declare two functions, this can be done:如果您想先声明两个函数,可以这样做:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

This solution could be easily generalized with regular expressions (and you might want to add a pattern argument, if you do not want your patterns to always stick to the start or end of the filename).这个解决方案可以很容易地用正则表达式推广(如果你不希望你的模式总是粘在文件名的开头或结尾,你可能想要添加一个pattern参数)。

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

Here I use a recursive structure.这里我使用递归结构。

Using generators使用生成器

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

Another very readable variant for Python 3.4+ is using pathlib.Path.glob: Python 3.4+ 的另一个非常易读的变体是使用 pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

It is simple to make more specific, eg only look for Python source files which are not symbolic links, also in all subdirectories:更具体的很简单,例如只在所有子目录中查找不是符号链接的 Python 源文件:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

For Python 2:对于 Python 2:

pip install rglob

Then do然后做

import rglob
file_list = rglob.rglob("/home/base/dir/", "*")
print file_list

Here's my general-purpose function for this.这是我的通用功能。 It returns a list of file paths rather than filenames since I found that to be more useful.它返回一个文件路径列表而不是文件名,因为我发现它更有用。 It has a few optional arguments that make it versatile.它有一些可选参数,使其用途广泛。 For instance, I often use it with arguments like pattern='*.txt' or subfolders=True .例如,我经常将它与pattern='*.txt'subfolders=True

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

dircache is "Deprecated since version 2.6: The dircache module has been removed in Python 3.0." dircache是“自 2.6 版起已弃用:Python 3.0 中已删除 dircache 模块。”

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

I will provide a sample one liner where sourcepath and file type can be provided as input.我将提供一个示例单行代码,其中可以提供源路径和文件类型作为输入。 The code returns a list of filenames with csv extension.该代码返回带有 csv 扩展名的文件名列表。 Use .使用. in case all files needs to be returned.如果需要返回所有文件。 This will also recursively scans the subdirectories.这也将递归扫描子目录。

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.根据需要修改文件扩展名和源路径。

To get all files from a specified folder (including subdirectories as well). 从指定的文件夹(包括子目录)获取所有文件。

import glob
import os

print([entry for entry in glob.iglob("{}/**".format("DIRECTORY_PATH"), recursive=True) if os.path.isfile(entry) == True])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM