匹配文件对象中的多行正则表达式

Question

如何从文件对象（data.txt）中提取此正则表达式中的组？

import numpy as np
import re
import os
ifile = open("data.txt",'r')

# Regex pattern
pattern = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \r{2}                       # Two carriage return
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)
                \s
                uIx=(\d+)
                \s
                storeI=(-?\d+.\d+)
                \s
                iIx=(\d+)
                \s
                avgCI=(-?\d+.\d+)
                """, re.VERBOSE | re.MULTILINE)

time = [];

for line in ifile:
    match = re.search(pattern, line)
    if match:
        time.append(match.group(1))

代码的最后一部分的问题是我逐行迭代，这显然不适用于多行正则表达式。 我试过像这样使用pattern.finditer(ifile) ：

for match in pattern.finditer(ifile):
    print match

...只是为了查看它是否有效，但是finditer方法需要一个字符串或缓冲区。

我也试过这种方法，但无法让它起作用

matches = [m.groups() for m in pattern.finditer(ifile)]

任何的想法？

在Mike和Tuomas的评论之后，我被告知要使用.read（）..这样的事情：

ifile = open("data.txt",'r').read()

这工作正常，但这是搜索文件的正确方法吗？ 无法让它工作......

for i in pattern.finditer(ifile):
    match = re.search(pattern, i)
    if match:
        time.append(match.group(1))

解

# Open file as file object and read to string
ifile = open("data.txt",'r')

# Read file object to string
text = ifile.read()

# Close file object
ifile.close()

# Regex pattern
pattern_meas = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \n{2}                       # Two newlines
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)           # Decimal-number
                \s
                uIx=(\d+)                   # Fetch uIx-variable
                \s
                storeI=(-?\d+.\d+)          # Fetch storeI-variable
                \s
                iIx=(\d+)                   # Fetch iIx-variable
                \s
                avgCI=(-?\d+.\d+)           # Fetch avgCI-variable
                """, re.VERBOSE | re.MULTILINE)

file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
    output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
    file_times.write(output)
file_times.close()

也许它可以写得更紧凑和pythonic虽然....

Answer 1

您可以使用ifile.read()将文件对象中的数据读入字符串

Answer 2

为什么不使用将整个文件读入缓冲区

buffer = open("data.txt").read()

然后用它进行搜索？

Answer 3

times = [match.group(1) for match in pattern.finditer(ifile.read())]

finditer yield MatchObjects 。 如果正则表达式不匹配，任何times都将是一个空列表。

您还可以修改您正则表达式使用非捕获组的storeU ， storeI ， iIx和avgCI ，然后pattern.findall将只包含匹配次数。

注意：命名变量time可能会影响标准库模块。 times将是一个更好的选择。

匹配文件对象中的多行正则表达式

问题描述

解

3 个解决方案

解决方案1
5 已采纳 2010-03-12 15:18:46

解决方案2
1 2010-03-12 15:20:07

解决方案3
1 2010-03-12 15:49:51

匹配文件对象中的多行正则表达式

问题描述

解

3 个解决方案

解决方案1 5 已采纳 2010-03-12 15:18:46

解决方案2 1 2010-03-12 15:20:07

解决方案3 1 2010-03-12 15:49:51

解决方案1
5 已采纳 2010-03-12 15:18:46

解决方案2
1 2010-03-12 15:20:07

解决方案3
1 2010-03-12 15:49:51