Python next substring search

Question

I am transmitting a message with a pre/postamble multiple times. I want to be able to extract the message between two valid pre/postambles. My curent code is

print(msgfile[msgfile.find(preamble) + len(preamble):msgfile.find(postamble, msgfile.find(preamble))])

The problem is that if the postamble is corrupt, it will print all data between the first valid preamble and the next valid postamble. An example received text file would be:

garbagePREAMBLEmessagePOSTcMBLEgarbage
garbagePRdAMBLEmessagePOSTAMBLEgarbage
garbagePREAMBLEmessagePOSTAMBLEgarbage

and it will print

messagePOSTcMBLEgarbage
garbagePRdEAMBLEmessage

but what i really want it to print is the message from the third line since it has both a valid pre/post amble. So I guess what i want is to be able to find and index from the next instance of a substring. Is there an easy way to do this?

edit: I dont expect my data to be in nice discrete lines. I just formatted it that way so it would be easier to see

Answer 1

Process it line by line:

>>> test = "garbagePREAMBLEmessagePOSTcMBLEgarbage\n"
>>> test += "garbagePRdAMBLEmessagePOSTAMBLEgarbage\n"
>>> test += "garbagePREAMBLEmessagePOSTAMBLEgarbage\n"
>>> for line in test.splitlines():
        if line.find(preamble) != -1 and line.find(postamble) != -1:
            print(line[line.find(preamble) + len(preamble):line.find(postamble)])

Answer 2

import re

lines = ["garbagePREAMBLEmessagePOSTcMBLEgarbage",
        "garbagePRdAMBLEmessagePOSTAMBLEgarbage",
        "garbagePREAMBLEmessagePOSTAMBLEgarbage"]

# you can use regex
my_regex = re.compile("garbagePREAMBLE(.*?)POSTAMBLEgarbage")

# get the match found between the preambles and print it
for line in lines:
    found = re.match(my_regex,line)
    # if there is a match print it
    if found:
        print(found.group(1))

# you can use string slicing
def validate(pre, post, message):
    for line in lines:
        # method would break on a string smaller than both preambles
        if len(line) < len(pre) + len(post):
            print("error line is too small")

        # see if the message fits the pattern
        if line[:len(pre)] == pre and line[-len(post):] == post:
            # print message
            print(line[len(pre):-len(post)])

validate("garbagePREAMBLE","POSTAMBLEgarbage", lines)

Answer 3

are all messages on single lines? Then you can use regular expressions to identify lines with valid pre- and postamble:

input_file = open(yourfilename)
import re
pat = re.compile('PREAMBLE(.+)POSTAMBLE')
messages = [pat.search(line).group(1) for line in input_file 
            if pat.search(line)]

print messages

Python next substring search

Question

3 answers

solution1
0 2013-04-16 22:12:18

solution2
0 2013-04-16 22:15:36

solution3
0 2013-04-16 22:16:15

Python next substring search

Question

3 answers

solution1 0 2013-04-16 22:12:18

solution2 0 2013-04-16 22:15:36

solution3 0 2013-04-16 22:16:15

solution1
0 2013-04-16 22:12:18

solution2
0 2013-04-16 22:15:36

solution3
0 2013-04-16 22:16:15