[英]Using regular expressions to read a file from a directory
I have a directory consisting of many files.我有一个包含许多文件的目录。 In each iteration of my for loop, I want to read a file starting with
在 for 循环的每次迭代中,我想读取一个以
"stc_" + str(k) + "anything here" + "_alpha.mat"
This k
changes in each iteration.这个
k
在每次迭代中都会发生变化。 How can I use regular expressions to read files like this?如何使用正则表达式来读取这样的文件?
There is only one file with "stc_" + str(k)
in the beginning.只有一个文件以
"stc_" + str(k)
开头。 But "anything here" changes from file to file.但是“这里的任何东西”会因文件而异。
I know one option is to rewrite all files but I want to learn how to use regular expressions for this purpose.我知道一种选择是重写所有文件,但我想学习如何为此目的使用正则表达式。
You can do it with filter
on os.listdir
:您可以使用
os.listdir
上的filter
来实现:
import os
import re
def glob_re(pattern, strings):
return filter(re.compile(pattern).match, strings)
filenames = glob_re(r'stc_\d.*_alpha\.mat', os.listdir())
You have not revealed the domain of k
, but based on comments, it seems to be a number.您没有透露
k
的域,但根据评论,它似乎是一个数字。
If there is only one file for each k
, you can simply loop over those.如果每个
k
只有一个文件,您可以简单地循环这些文件。
for knum in range(kmin, kmax+1):
for file in glob.glob("stc_%i*_alpha.mat" % knum):
# Only expect one match
process(file)
If you are really hellbent on using a regular expression for this, the regex for the numbers 7 through 24 is simply (?:7|8|9|10|11|...|23|24)
(it could be simplified to (?:[7-9]|1[0-9]|2[0-4])
but here, it's probably not worth the effort).如果您真的一心想为此使用正则表达式,那么数字 7 到 24 的正则表达式就是
(?:7|8|9|10|11|...|23|24)
(可以简化为(?:[7-9]|1[0-9]|2[0-4])
但在这里,它可能不值得付出努力)。
os.listdir
will return the matched files sorted alphabetically; os.listdir
将返回按字母顺序排序的匹配文件; if you require a different sort order, probably use os.scandir
and supply your own sort function.如果您需要不同的排序顺序,可以使用
os.scandir
并提供您自己的排序功能。
my_files = []
for file in os.scandir(directory):
m = re.match(r'stc_(\d+).*_alpha\.mat', file)
if m:
# Maybe you only care about a particular range for k?
kcurr = int(m.group(1))
if kcurr < 7 or kcurr > 24:
continue
my_files.append(kcurr, file))
my_files = [x[1] for x in sorted(my_files)]
Here, we use the regex grouping parentheses to extract a tuple containing the sort key and the file name, then discard the sort keys after sorting, keeping only the sorted list of matching files.在这里,我们使用正则表达式分组括号提取包含排序键和文件名的元组,然后在排序后丢弃排序键,只保留匹配文件的排序列表。 (See also Schwarzian transform.)
(另见施瓦兹变换。)
The if
clause which skips values lower than 7 or bigger than 24 demonstrates how to only cover specific numbers;跳过小于 7 或大于 24 的值的
if
子句演示了如何只覆盖特定的数字; if you don't need that, obviously take it out.如果你不需要它,显然把它拿出来。
Hitting the disk is on the order of 1,000 times slower than processing data in memory, so you generally want to avoid repeatedly accessing the disk.访问磁盘比在内存中处理数据慢 1,000 倍,因此您通常希望避免重复访问磁盘。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.