简体   繁体   English

使用正则表达式从目录中读取文件

[英]Using regular expressions to read a file from a directory

I have a directory consisting of many files.我有一个包含许多文件的目录。 In each iteration of my for loop, I want to read a file starting with在 for 循环的每次迭代中,我想读取一个以

"stc_" + str(k) + "anything here" + "_alpha.mat"

This k changes in each iteration.这个k在每次迭代中都会发生变化。 How can I use regular expressions to read files like this?如何使用正则表达式来读取这样的文件?

There is only one file with "stc_" + str(k) in the beginning.只有一个文件以"stc_" + str(k)开头。 But "anything here" changes from file to file.但是“这里的任何东西”会因文件而异。

I know one option is to rewrite all files but I want to learn how to use regular expressions for this purpose.我知道一种选择是重写所有文件,但我想学习如何为此目的使用正则表达式。

You can do it with filter on os.listdir :您可以使用os.listdir上的filter来实现:

import os
import re

def glob_re(pattern, strings):
    return filter(re.compile(pattern).match, strings)

filenames = glob_re(r'stc_\d.*_alpha\.mat', os.listdir())

You have not revealed the domain of k , but based on comments, it seems to be a number.您没有透露k的域,但根据评论,它似乎是一个数字。

If there is only one file for each k , you can simply loop over those.如果每个k只有一个文件,您可以简单地循环这些文件。

for knum in range(kmin, kmax+1):
     for file in glob.glob("stc_%i*_alpha.mat" % knum):
        # Only expect one match
        process(file)

If you are really hellbent on using a regular expression for this, the regex for the numbers 7 through 24 is simply (?:7|8|9|10|11|...|23|24) (it could be simplified to (?:[7-9]|1[0-9]|2[0-4]) but here, it's probably not worth the effort).如果您真的一心想为此使用正则表达式,那么数字 7 到 24 的正则表达式就是(?:7|8|9|10|11|...|23|24) (可以简化为(?:[7-9]|1[0-9]|2[0-4])但在这里,它可能不值得付出努力)。

os.listdir will return the matched files sorted alphabetically; os.listdir将返回按字母顺序排序的匹配文件; if you require a different sort order, probably use os.scandir and supply your own sort function.如果您需要不同的排序顺序,可以使用os.scandir并提供您自己的排序功能。

my_files = []
for file in os.scandir(directory):
    m = re.match(r'stc_(\d+).*_alpha\.mat', file)
    if m:
        # Maybe you only care about a particular range for k?
        kcurr = int(m.group(1))
        if kcurr < 7 or kcurr > 24:
            continue
        my_files.append(kcurr, file))
my_files = [x[1] for x in sorted(my_files)]

Here, we use the regex grouping parentheses to extract a tuple containing the sort key and the file name, then discard the sort keys after sorting, keeping only the sorted list of matching files.在这里,我们使用正则表达式分组括号提取包含排序键和文件名的元组,然后在排序后丢弃排序键,只保留匹配文件的排序列表。 (See also Schwarzian transform.) (另见施瓦兹变换。)

The if clause which skips values lower than 7 or bigger than 24 demonstrates how to only cover specific numbers;跳过小于 7 或大于 24 的值的if子句演示了如何只覆盖特定的数字; if you don't need that, obviously take it out.如果你不需要它,显然把它拿出来。

Hitting the disk is on the order of 1,000 times slower than processing data in memory, so you generally want to avoid repeatedly accessing the disk.访问磁盘比在内存中处理数据慢 1,000 倍,因此您通常希望避免重复访问磁盘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM