使用python从文件中提取带有字符串的行

Question

Team, 球队，

I want to extract some lines using a string(starts with tg_) from a file and i get the output as per below regex..the question is, 我想使用一个字符串（以tg_开头）从文件中提取一些行，并根据以下正则表达式获取输出。.问题是，

I am not sure how to extract the line if goes for 2 lines ends with \\ like below. 我不确定如果2行以\\结尾，如下所示，该如何提取行。
I don't know how to remove the special characters with the below existing below regexp. 我不知道如何使用正则表达式下面的以下内容删除特殊字符。

*****from a file******* *****来自文件*******

tg_cr_counters dghbvcvgfv tg_cr_counters dghbvcvgfv

tg_kk_bb a group1 bye bye bye hi hi hi 1 \\ <<<< tg_kk_bb a group1再见再见嗨嗨1 \\ <<<<
patch mac hdfh f dgf asadasf \\ 修补程序mac hdfh f dgf asadasf \\
dgfgmnhnjgfg dgfgmnhnjgfg

tg_cr_counters gthghtrhgh }} ] <<<<< tg_cr_counters gthghtrhgh}}] <<<<<

tg_cr_counters fkgnfkmngvd tg_cr_counters fkgnfkmngvd

import re

file = open("C:\\Users\\input.tcl", "r")
f1 = file.readlines()

output = open("extract.txt", "a+")

match_list = [ ]   

for item in f1:

    match_list = re.findall(r'[t][g][_]+\w+.*', item)
    if(len(match_list)>0):
        output.write(match_list[0]+"\r\n")
        print(match_list)

Answer 1

You can use regex with flags for re.MULTILINE and re.DOTALL . 您可以将regex与标志一起用于re.MULTILINE和re.DOTALL 。

This way a . 这样一来. will also match \\n and you can look for anything that starts with tg_ (no need to put each in [] ) and ends with a double \\n\\n (or end of text) \\Z : 也会匹配\\n ，您可以查找以tg_ （无需将每个都放在[] ）和以双\\n\\n （或文本结尾） \\Z结尾的任何内容：

fn = "t.txt"
with open (fn,"w") as f: 
    f.write("""*****from a file*******

tg_cr_counters dghbvcvgfv

tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf \
dgfgmnhnjgfg

tg_cr_counters gthghtrhgh }} ] <<<<<

tg_cr_counters fkgnfkmngvd
""")

import re

with open("extract.txt", "a+") as o, open(fn) as f:
    for m in re.findall(r'^tg_.*?(?:\n\n|\Z)', f.read(), flags=re.M|re.S):
        o.write("-"*40+"\r\n")
        o.write(m)
        o.write("-"*40+"\r\n")

with open("extract.txt")as f:
    print(f.read())

Output (each match is between a line of ---------------------------------------- ): 输出（每次匹配都在----------------------------------------的一行之间）：

----------------------------------------
tg_cr_counters dghbvcvgfv

----------------------------------------
----------------------------------------
tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf dgfgmnhnjgfg

----------------------------------------
----------------------------------------
tg_cr_counters gthghtrhgh }} ] <<<<<

----------------------------------------
----------------------------------------
tg_cr_counters fkgnfkmngvd
----------------------------------------

re.findall() result looks like: re.findall()结果如下：

['tg_cr_counters dghbvcvgfv\n\n', 
 'tg_kk_bb a group1 bye bye bye hi hi hi 1 \\ <<<<\npatch mac hdfh f dgf asadasf dgfgmnhnjgfg\n\n', 
 'tg_cr_counters gthghtrhgh }} ] <<<<<\n\n', 
 'tg_cr_counters fkgnfkmngvd\n']

To enable multiline-searches you need to read in more then one line at a time - if your file is humongeous this will lead to memory problems. 要启用多行搜索，您需要一次读入多行内容-如果文件太小，则会导致内存问题。

使用python从文件中提取带有字符串的行

问题描述

1 个解决方案

解决方案1
1 2018-11-28 17:26:00

使用python从文件中提取带有字符串的行

问题描述

1 个解决方案

解决方案1 1 2018-11-28 17:26:00

解决方案1
1 2018-11-28 17:26:00