I have a text file which contain some format like :
PAGE(leave) 'Data1'
line 1
line 2
line 2
...
...
...
PAGE(enter) 'Data1'
I need to get all the lines in between the two keywords and save it a text file. I have come across the following so far. But I have an issue with single quotes
as regular expression thinks it as the quote in the expression rather than the keyword.
My codes so far:
log_file = open('messages','r')
data = log_file.read()
block = re.compile(ur'PAGE\(leave\) \'Data1\'[\S ]+\s((?:(?![^\n]+PAGE\(enter\) \'Data1\').)*)', re.IGNORECASE | re.DOTALL)
data_in_home_block=re.findall(block, data)
file = 0
make_directory("home_to_home_data",1)
for line in data_in_home_block:
file = file + 1
with open("home_to_home_" + str(file) , "a") as data_in_home_to_home:
data_in_home_to_home.write(str(line))
It would be great if someone could guide me how to implement it..
As pointed out by @JoanCharmant, it is not necessary to use regex for this task, because the records are delimited by fixed strings.
Something like this should be enough:
messages = open('messages').read()
blocks = [block.rpartition(r"PAGE\(enter\) 'Data1'")[0]
for block in messages.split(r"PAGE\(leave\) 'Data1'")
if block and not block.isspace()]
for count, block in enumerate(blocks, 1):
with open('home_to_home_%d' % count, 'a') as stream:
stream.write(block)
If it's single quotes what worry you, you can start the regular expression string with double quotes...
'hello "howdy"' # Correct
"hello 'howdy'" # Correct
Now, there are more issues here... Even when declared as r
, you still must escape your regular expression's backslashes in the .compile
(see What does the "r" in pythons re.compile(r' pattern flags') mean? ) Is just that without the r
, you probably would need a lot more of backslashes.
I've created a test file with two "sections":
PAGE\(leave\) 'Data1'
line 1
line 2
line 3
PAGE\(enter\) 'Data1'
PAGE\(leave\) 'Data1'
line 4
line 5
line 6
PAGE\(enter\) 'Data1'
The code below will do what you want (I think)
import re
log_file = open('test.txt', 'r')
data = log_file.read()
log_file.close()
block = re.compile(
ur"(PAGE\\\(leave\\\) 'Data1'\n)"
"(.*?)"
"(PAGE\\\(enter\\\) 'Data1')",
re.IGNORECASE | re.DOTALL | re.MULTILINE
)
data_in_home_block = [result[1] for result in re.findall(block, data)]
for data_block in data_in_home_block:
print "Found data_block: %s" % (data_block,)
Outputs:
Found data_block: line 1
line 2
line 3
Found data_block: line 4
line 5
line 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.