Team,
I want to extract some lines using a string(starts with tg_) from a file and i get the output as per below regex..the question is,
I am not sure how to extract the line if goes for 2 lines ends with \\
like below.
I don't know how to remove the special characters with the below existing below regexp.
*****from a file*******
tg_cr_counters dghbvcvgfv
tg_kk_bb a group1 bye bye bye hi hi hi 1 \\ <<<<
patch mac hdfh f dgf asadasf \\
dgfgmnhnjgfgtg_cr_counters gthghtrhgh }} ] <<<<<
tg_cr_counters fkgnfkmngvd
import re
file = open("C:\\Users\\input.tcl", "r")
f1 = file.readlines()
output = open("extract.txt", "a+")
match_list = [ ]
for item in f1:
match_list = re.findall(r'[t][g][_]+\w+.*', item)
if(len(match_list)>0):
output.write(match_list[0]+"\r\n")
print(match_list)
You can use regex with flags for re.MULTILINE and re.DOTALL .
This way a .
will also match \\n
and you can look for anything that starts with tg_
(no need to put each in []
) and ends with a double \\n\\n
(or end of text) \\Z
:
fn = "t.txt"
with open (fn,"w") as f:
f.write("""*****from a file*******
tg_cr_counters dghbvcvgfv
tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf \
dgfgmnhnjgfg
tg_cr_counters gthghtrhgh }} ] <<<<<
tg_cr_counters fkgnfkmngvd
""")
import re
with open("extract.txt", "a+") as o, open(fn) as f:
for m in re.findall(r'^tg_.*?(?:\n\n|\Z)', f.read(), flags=re.M|re.S):
o.write("-"*40+"\r\n")
o.write(m)
o.write("-"*40+"\r\n")
with open("extract.txt")as f:
print(f.read())
Output (each match is between a line of ----------------------------------------
):
----------------------------------------
tg_cr_counters dghbvcvgfv
----------------------------------------
----------------------------------------
tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf dgfgmnhnjgfg
----------------------------------------
----------------------------------------
tg_cr_counters gthghtrhgh }} ] <<<<<
----------------------------------------
----------------------------------------
tg_cr_counters fkgnfkmngvd
----------------------------------------
re.findall()
result looks like:
['tg_cr_counters dghbvcvgfv\n\n',
'tg_kk_bb a group1 bye bye bye hi hi hi 1 \\ <<<<\npatch mac hdfh f dgf asadasf dgfgmnhnjgfg\n\n',
'tg_cr_counters gthghtrhgh }} ] <<<<<\n\n',
'tg_cr_counters fkgnfkmngvd\n']
To enable multiline-searches you need to read in more then one line at a time - if your file is humongeous this will lead to memory problems.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.