简体   繁体   English

python:在文件中提取(正则表达式)模式而无需逐行(多行搜索)

[英]python: extracting (regex) pattern in a file without going through line by line (multiline search)

I can extract a particualr pattern by reading mystring.txt file line by line and checking the line against re.search(r'pattern',line_txt) method.我可以通过逐行读取 mystring.txt 文件并根据 re.search(r'pattern',line_txt) 方法检查该行来提取特定模式。

Following is the mystring.txt以下是mystring.txt

` `

Client: //home/SCM/dev/applications/build_system/test_suite_linux/unit_testing



Stream: //MainStream/testing_branch

Options:    dir, norm accel, ddl



SubmitOptions:  vis, dir, cas, cat

` `

using python, I can get the stream name as //MainStream/testing_branch使用 python,我可以得到流名称为 //MainStream/testing_branch

import re 
with open("mystring.txt",'r') as f:
    mystring= f.readlines()
    for line in mystring:
        if re.search(r'^Stream\:',line):

            stream_name = line.split('\t')[1]
            print stream_name

instead of going line by line in a loop, how is it possible to extract the same information by only using the re module?不是逐行循环,如何仅使用 re 模块来提取相同的信息?

You can read the file in one go and use re.findall (beware if the file is too large, loading it to main memory will not be good idea)您可以re.findall读取文件并使用re.findall (请注意,如果文件太大,将其加载到主内存将不是一个好主意)

import re
content = open("input_file").read()
print(re.findall("^Stream: (.*)", content, re.M))

Yes, you can use: re.MULTILINE with re.search(..) .是的,您可以使用: re.MULTILINEre.search(..)

>>> import re
>>> re.search(r'^Stream\:\s([^\n]+)', f.read(), re.MULTILINE).group(1)
'//MainStream/testing_branch'

Here is the solution这是解决方案

f = open("mystring.txt").read()

import re

got = re.findall("Stream: .+\n", f)

got = got[0].strip()

print(got.split(": ")[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM