简体   繁体   中英

Finding data in-between two strings in python

I have a text file which contain some format like :

PAGE(leave) 'Data1'
line 1
line 2 
line 2
...
...
...
PAGE(enter) 'Data1'

I need to get all the lines in between the two keywords and save it a text file. I have come across the following so far. But I have an issue with single quotes as regular expression thinks it as the quote in the expression rather than the keyword.

My codes so far:

log_file = open('messages','r')
    data = log_file.read()
    block = re.compile(ur'PAGE\(leave\) \'Data1\'[\S ]+\s((?:(?![^\n]+PAGE\(enter\) \'Data1\').)*)', re.IGNORECASE | re.DOTALL)
    data_in_home_block=re.findall(block, data)
    file = 0
    make_directory("home_to_home_data",1)
    for line in data_in_home_block:
        file = file + 1
        with open("home_to_home_" + str(file) , "a") as data_in_home_to_home:
            data_in_home_to_home.write(str(line))

It would be great if someone could guide me how to implement it..

As pointed out by @JoanCharmant, it is not necessary to use regex for this task, because the records are delimited by fixed strings.

Something like this should be enough:

messages = open('messages').read()

blocks = [block.rpartition(r"PAGE\(enter\) 'Data1'")[0]
          for block in messages.split(r"PAGE\(leave\) 'Data1'")
          if block and not block.isspace()]

for count, block in enumerate(blocks, 1):
    with open('home_to_home_%d' % count, 'a') as stream:
        stream.write(block)

If it's single quotes what worry you, you can start the regular expression string with double quotes...

'hello "howdy"'  # Correct
"hello 'howdy'"  # Correct

Now, there are more issues here... Even when declared as r , you still must escape your regular expression's backslashes in the .compile (see What does the "r" in pythons re.compile(r' pattern flags') mean? ) Is just that without the r , you probably would need a lot more of backslashes.

I've created a test file with two "sections":

PAGE\(leave\) 'Data1'
line 1
line 2 
line 3
PAGE\(enter\) 'Data1'

PAGE\(leave\) 'Data1'
line 4
line 5 
line 6
PAGE\(enter\) 'Data1'

The code below will do what you want (I think)

import re

log_file = open('test.txt', 'r')
data = log_file.read()
log_file.close()
block = re.compile(
    ur"(PAGE\\\(leave\\\) 'Data1'\n)"
    "(.*?)"
    "(PAGE\\\(enter\\\) 'Data1')",
    re.IGNORECASE | re.DOTALL | re.MULTILINE
)
data_in_home_block = [result[1] for result in re.findall(block, data)]
for data_block in data_in_home_block:
    print "Found data_block: %s" % (data_block,)

Outputs:

Found data_block: line 1
line 2 
line 3

Found data_block: line 4
line 5 
line 6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM