简体   繁体   English

如何遍历文件并按文件名中的字符串标识文件

[英]How can I loop through files and identify files by string in filename

In Python: I'm trying to loop through files in a directory, find the files that have a certain string in their file name, open and edit those files. 在Python中:我试图遍历目录中的文件,查找文件名中具有特定字符串的文件,然后打开并编辑这些文件。 It all seems to be working except being able to select specific files in the directory based on the string: 除了能够根据字符串在目录中选择特定文件之外,这一切似乎都有效:

import re
import datetime as dt
OldValue = input('Enter the value to be replaced: ')
NewValue = input('Enter the replacement value: ')
location = input('Enter path to directory: ')
directory = os.listdir(location)
os.chdir(location)
for root, dirs,files in os.walk('.'):
    for fname in files:
        re.match('PMPM', fname)

for f in os.listdir(location):
    for file in directory:
                open_file = open(file, 'r')
                read_file = open_file.read()
                regex = re.compile(OldValue)
                read_file = regex.sub(NewValue, read_file)
                write_file = open(file, 'w')
                write_file.write(read_file)
                now = dt.datetime.now()
                ago = now-dt.timedelta(minutes=30)
for root, dirs,files in os.walk('.'):
    for fname in files:
        path = os.path.join(root, fname)
st = os.stat(path)
mtime = dt.datetime.fromtimestamp(st.st_mtime)
if mtime > ago:
    print('%s modified %s' % (path,  mtime))

If all you want is a list of the filenames from a given directory containing a given substring then something like this should work: 如果您想要的只是给定目录中包含给定子字符串的文件名列表,则应如下所示:

#!python
import os
dir='.'         # Replace with path to your directory: absolute or relative
pattern = 'foo' # Replace with your target substring
matching_files = [f for f in os.listdir(dir) if pattern in f]

That's all you need for the simplest case. 这是最简单的情况所需要的。 You can then iterate over the list of matching_files 然后,您可以遍历matching_files名单

If you want to walk down a directory tree with os.walk() then you have to search the third item from each tuple returned by the generator. 如果要使用os.walk()遍历目录树,则必须从生成器返回的每个元组中搜索第三项。

os.walk() recurses down a tree returning a tuple for each subdirectory. os.walk()递归到一棵树,为每个子目录返回一个元组。 Each of these consists of three items: the leading path, the list of subdirectories below that, and the list of filenames (directory entries for anything OTHER than a subdirectory) at that node. 其中的每一个都由三个项目组成:引导路径,该路径下的子目录列表以及该节点上的文件名列表(除子目录以外的其他任何文件的目录条目)。

However, there's also a trick! 但是,还有一个窍门! You'll need to prefix each match with that dirpath item at that level. 您需要在该级别为每个匹配项添加该目录路径项的前缀。 In other words for every match in tuple(os.walk(...))[2] (the list) you need to yield the concatenation of that with the corresponding string from tuple(os.walk(...))[0] to get a full (absolute or relative) path to the matching filename. 换句话说,对于tuple(os.walk(...))[2](列表)中的每个匹配项,您都需要将其与tu​​ple(os.walk(...))[ 0]以获得匹配文件名的完整(绝对或相对)路径。

One way to get a feel for how this works is to load up your Python interpreter (preferably iPython from the Jupyter project), instantiate a generator with walker = os.walk(dir) (where dir is any valid directory to use as the starting point) and then call this = next(walker) and you can look at this[0] and this[2] before going on to look at the next(walker) . 感受这种工作方式的一种方法是加载您的Python解释器(最好是Jupyter项目中的iPython),使用walker = os.walk(dir)实例化生成器(其中dir是任何可用作起始目录的有效目录)点),然后调用此=下(步行者),你可以看看这个[0], 这[2]才去上一下下(沃克)。

Let's start with code which returns a list using simple substring matching (as I did with my previous example, but over multiple lines for clarity): 让我们从使用简单子字符串匹配返回列表的代码开始(就像我之前的示例一样,但是为了清楚起见,在多行代码上):

#!python
results = list()
dir = '.'
walker = os.walk(dir)
delimiter = os.path.sep
pattern = '.txt'
for p,_,f in walker:
  matches = ['%s%s%s' % (p, delimiter, x) for x in f if pattern in f]
  results.extend(matches)

In this case I'm using the tuple unpacking of the for loop to give me the path and file list components from each tuple yielded by the os.walk() generator. 在这种情况下,我使用了for循环的元组解包,以便为我提供os.walk()生成器产生的每个元组的路径和文件列表组件。 The matches, at each node in the tree, are extracted in a list comprehension which is also prefixing each match with the path (and using the os.path.sep to make the code portable across different operating system platforms). 树列表中的每个节点上的匹配都以列表理解的方式提取,列表理解中的每个匹配也以路径作为前缀(并使用os.path.sep使代码可跨不同的操作系统平台移植)。

Also notice that _ is just a variable name in Python, but it's conventionally used to "throw away" some value. 还要注意,_只是Python中的变量名,但通常用于“丢弃”某些值。 In other words using _ as a variable in Python is a hint to readers and maintainers that this was some unwanted stuff that your code is not interested in using later. 换句话说,在Python中使用_作为变量是向读者和维护者的暗示,这是您的代码以后不希望使用的一些不需要的内容。

It would be better to write this as a generator function and yield results rather than perform the full traversal (potentially consuming time and memory). 最好将其编写为生成器函数并产生结果,而不是执行完整的遍历(可能会消耗时间和内存)。 With our own generator wrapped around os.walk() we could more easily process each match subject to other conditions (find the first, the first N, wrap in even more filtering, and so on). 使用我们自己的生成器包裹在os.walk()周围,​​我们可以更轻松地在其他条件下处理每个匹配项(查找第一个,第一个N,包装更多的过滤条件,依此类推)。

Also I'm using simple substring matching (using Python's in operator, which calls the ._ _contains_ _() special method. We can use regular expressions for this ... though I commend being wary of re.match() which only matches patterns at the beginning of each string against which it's matched. 我也使用简单的子字符串匹配(使用Python的in运算符,该运算符调用._ _contains_ __()特殊方法。我们可以为此使用正则表达式...尽管我建议对仅匹配的re.match()保持警惕与之匹配的每个字符串开头的模式。

So here's that: 所以这是:

#!python
import os, re
def matchwalk(regex, directory):
    '''Yield path/filenames matching some regular expression
    '''
    sep = os.path.sep
    pattern = re.compile(regex)
    for p,_,f in os.walk(directory):
        for each in f:
            if pattern.search(each):
                yield '{0}{1}{2}'.format(p,sep,each)

This is similar to the previous code example. 这类似于前面的代码示例。 The differences: I've wrapped it in a function, I'm using yield so the function is a generator (just like os.walk() ). 区别:我将其包装在一个函数中,我使用yield,因此该函数是一个生成器(就像os.walk()一样 )。 I'm using regular expressions; 我正在使用正则表达式; I prefer to use re.compile() for legibility (there might be some marginal performance benefit as well, but probably not under most Python implementations as the re module will often do it's on interning of regular expression just as Python does interning of many strings). 我更倾向于使用re.compile()来提高可读性(也可能会有一些边际性能优势,但是在大多数Python实现中可能不是这样,因为re模块通常会在正则表达式的内部进行处理,就像Python在许多字符串之间进行内部处理一样) )。 Also I'm using the newer style string formatting function (though I personally prefer the old syntax; this is just for edification). 另外,我正在使用较新的样式字符串格式化功能(尽管我个人更喜欢旧语法;这仅是为了教育)。

You might want to have a look at the standard unix style pathname pattern expansoin package, or simply glob . 您可能想看看标准的unix样式路径名模式expansoin包,或者只是glob


Running through all files with a filename starting with, say 'PMPM' , in a specific directory, say '~/path/to/mydir' , is a simple as: 在特定目录(例如'~/path/to/mydir' 'PMPM'开头的所有文件名运行所有文件非常简单:

import os
import glob

pattern = os.path.join(
    os.path.expanduser('~'),
    'path/to/mydir',
    'PMPM*' # mind the * here!
)

for matching_file in glob.glob(pattern):
    with open(matching_file, 'r') as f:
        # do something with the file object
        pass

Or the same in short: 或简而言之:

from glob import glob
for mf in glob('home/someuser/path/to/mydir/PMPM*'):
    with open(mf, 'r') as f:
        pass # do something with f

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在文件名中包含匹配字符串的两个文件中分类? - how can I cat two files with a matching string in the filename? 如何在 python 中将导入的 txt 文件的文件名添加到 dataframe - How can I add filename of imported txt files to dataframe in python 在 python 中,如何检查文件名是否以 '.html' 或 '_files' 结尾? - In python, how can I check if a filename ends in '.html' or '_files'? Python:如何遍历文件目录并将文件名末尾的文本移动到所述文件名的开头 - Python: How to loop through a directory of files and move text at the end of a filename to the beginning of the said file name 如何遍历 .txt 文件并搜索特定字符串? - How to loop through .txt files and search for a specific string? 我可以遍历目录和子目录并将某些文件存储在数组中吗? - Can I loop through directories and subdirectories and store certain files in an array? 如何循环浏览多个页面以使用 Selenium 和 Python 下载 excel 文件 - How can I loop through several pages to download excel files using Selenium and Python 如何循环遍历当前目录中的所有 .py 文件并从每个文件导入一个变量? - How can I loop through all the .py files in my current directory & import a variable from each one? 如何按文件名重组大量文件? - How to I reorganize a large number of files by filename? For循环遍历文件 - For loop to loop through files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM