简体   繁体   English

在 Python 中查找扩展名为 .txt 的目录中的所有文件

[英]Find all files in a directory with extension .txt in Python

How can I find all the files in a directory having the extension .txt in python?如何在python中找到扩展名为.txt的目录中的所有文件?

You can use glob :您可以使用glob

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

or simply os.listdir :或者干脆os.listdir

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

or if you want to traverse directory, use os.walk :或者如果你想遍历目录,使用os.walk

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))

Use glob .使用glob

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

Something like that should do the job这样的事情应该可以完成工作

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print(file)

Something like this will work:像这样的事情会起作用:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

You can simply use pathlib s glob 1 :您可以简单地使用pathlib s glob 1

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

or in a loop:或在循环中:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

If you want it recursive you can use .glob('**/*.txt')如果你想要它递归你可以使用.glob('**/*.txt')


1 The pathlib module was included in the standard library in python 3.4. 1 pathlib模块包含在 python 3.4 的标准库中。 But you can install back-ports of that module even on older Python versions (ie using conda or pip ): pathlib and pathlib2 .但是您甚至可以在较旧的 Python 版本(即使用condapip )上安装该模块的后向端口: pathlibpathlib2

import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]

I like os.walk() :我喜欢os.walk()

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

Or with generators:或者使用发电机:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

Here's more versions of the same that produce slightly different results:以下是更多版本的相同结果,但结果略有不同:

glob.iglob() glob.iglob()

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

glob.glob1() glob.glob1()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter() fnmatch.filter()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

Python v3.5+蟒蛇 v3.5+

Fast method using os.scandir in a recursive function.在递归函数中使用 os.scandir 的快速方法。 Searches for all files with a specified extension in folder and sub-folders.在文件夹和子文件夹中搜索具有指定扩展名的所有文件。 It is fast, even for finding 10,000s of files.即使查找 10,000 个文件,它也很快。

I have also included a function to convert the output to a Pandas Dataframe.我还包含了一个将输出转换为 Pandas Dataframe 的函数。

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]
    
    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .
    
    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 
    
                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]
    
                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path
    
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    
    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })
    
    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

path.py is another alternative: https://github.com/jaraco/path.py path.py 是另一种选择: https : //github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

Try this this will find all your files recursively:试试这个,这将递归地找到所有文件:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

To get all '.txt' file names inside 'dataPath' folder as a list in a Pythonic way:要以 Pythonic 方式将“dataPath”文件夹中的所有“.txt”文件名作为列表获取:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

Python has all tools to do this: Python 拥有执行此操作的所有工具:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

I did a test (Python 3.6.4, W7x64) to see which solution is the fastest for one folder, no subdirectories, to get a list of complete file paths for files with a specific extension.我做了一个测试(Python 3.6.4,W7x64),看看哪个解决方案对于一个文件夹最快,没有子目录,以获得具有特定扩展名的文件的完整文件路径列表。

To make it short, for this task os.listdir() is the fastest and is 1.7x as fast as the next best: os.walk() (with a break!), 2.7x as fast as pathlib , 3.2x faster than os.scandir() and 3.3x faster than glob .简而言之,对于这个任务os.listdir()是最快的,并且是下一个最好的os.walk() pathlib倍(有休息时间!),是pathlib 2.7 倍,比pathlibpathlibos.scandir()并且比glob快 3.3 倍。
Please keep in mind, that those results will change when you need recursive results.请记住,当您需要递归结果时,这些结果会发生变化。 If you copy/paste one method below, please add a .lower() otherwise .EXT would not be found when searching for .ext.如果您复制/粘贴下面的一种方法,请添加一个 .lower() 否则在搜索 .ext 时将找不到 .EXT。

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

Results:结果:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274
import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res

To get an array of ".txt" file names from a folder called "data" in the same directory I usually use this simple line of code:要从同一目录中名为“data”的文件夹中获取“.txt”文件名数组,我通常使用以下简单的代码行:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

This code makes my life simpler.这段代码让我的生活更简单。

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

Use fnmatch: https://docs.python.org/2/library/fnmatch.html使用 fnmatch: https : //docs.python.org/2/library/fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

I suggest you to use fnmatch and the upper method.我建议你使用fnmatch和上层方法。 In this way you can find any of the following:通过这种方式,您可以找到以下任何一项:

  1. Name.姓名。 txt ; .txt ;
  2. Name.姓名。 TXT ;文本文件
  3. Name.姓名。 Txt文本

. .

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

A copy-pastable solution similar to the one of ghostdog:一种类似于 ghostdog 的可复制粘贴解决方案:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

You can also use yield to create a generator and thus avoid assembling the complete list:您还可以使用yield创建生成器,从而避免组装完整列表:

def get_all_filepaths(root_path, ext):
    import os
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                yield os.path.join(root, filename)

Here's one with extend()这是一个extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

Functional solution with sub-directories:带有子目录的功能解决方案:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

In case the folder contains a lot of files or memory is an constraint, consider using generators:如果文件夹包含大量文件或内存受限,请考虑使用生成器:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

Option A: Iterate选项 A:迭代

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

Option B: Get all选项 B:获得所有

files = [f for f in yield_files_with_extensions('.', '.txt')]

use Python OS module to find files with specific extension.使用 Python OS模块查找具有特定扩展名的文件。

the simple example is here :简单的例子在这里:

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

Many users have replied with os.walk answers, which includes all files but also all directories and subdirectories and their files.许多用户回复了os.walk答案,其中包括所有文件,但也包括所有目录和子目录及其文件。

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

Or for a one off where you don't need a generator:或者对于一个不需要发电机的人:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

If you are going to use matches for something else, you may want to make it a list rather than a generator expression:如果您打算将匹配项用于其他内容,您可能希望将其设为列表而不是生成器表达式:

    matches = [f for f in dirfiles if f.endswith(ext)]

A simple method by using for loop :使用for循环的简单方法:

import os

dir = ["e","x","e"]

p = os.listdir('E:')  #path

for n in range(len(p)):
   name = p[n]
   myfile = [name[-3],name[-2],name[-1]]  #for .txt
   if myfile == dir :
      print(name)
   else:
      print("nops")

Though this can be made more generalised .虽然这可以更概括。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM