简体   繁体   中英

Python next substring search

I am transmitting a message with a pre/postamble multiple times. I want to be able to extract the message between two valid pre/postambles. My curent code is

print(msgfile[msgfile.find(preamble) + len(preamble):msgfile.find(postamble, msgfile.find(preamble))])

The problem is that if the postamble is corrupt, it will print all data between the first valid preamble and the next valid postamble. An example received text file would be:

garbagePREAMBLEmessagePOSTcMBLEgarbage
garbagePRdAMBLEmessagePOSTAMBLEgarbage
garbagePREAMBLEmessagePOSTAMBLEgarbage

and it will print

messagePOSTcMBLEgarbage
garbagePRdEAMBLEmessage

but what i really want it to print is the message from the third line since it has both a valid pre/post amble. So I guess what i want is to be able to find and index from the next instance of a substring. Is there an easy way to do this?

edit: I dont expect my data to be in nice discrete lines. I just formatted it that way so it would be easier to see

Process it line by line:

>>> test = "garbagePREAMBLEmessagePOSTcMBLEgarbage\n"
>>> test += "garbagePRdAMBLEmessagePOSTAMBLEgarbage\n"
>>> test += "garbagePREAMBLEmessagePOSTAMBLEgarbage\n"
>>> for line in test.splitlines():
        if line.find(preamble) != -1 and line.find(postamble) != -1:
            print(line[line.find(preamble) + len(preamble):line.find(postamble)])
import re

lines = ["garbagePREAMBLEmessagePOSTcMBLEgarbage",
        "garbagePRdAMBLEmessagePOSTAMBLEgarbage",
        "garbagePREAMBLEmessagePOSTAMBLEgarbage"]

# you can use regex
my_regex = re.compile("garbagePREAMBLE(.*?)POSTAMBLEgarbage")

# get the match found between the preambles and print it
for line in lines:
    found = re.match(my_regex,line)
    # if there is a match print it
    if found:
        print(found.group(1))

# you can use string slicing
def validate(pre, post, message):
    for line in lines:
        # method would break on a string smaller than both preambles
        if len(line) < len(pre) + len(post):
            print("error line is too small")

        # see if the message fits the pattern
        if line[:len(pre)] == pre and line[-len(post):] == post:
            # print message
            print(line[len(pre):-len(post)])

validate("garbagePREAMBLE","POSTAMBLEgarbage", lines)

are all messages on single lines? Then you can use regular expressions to identify lines with valid pre- and postamble:

input_file = open(yourfilename)
import re
pat = re.compile('PREAMBLE(.+)POSTAMBLE')
messages = [pat.search(line).group(1) for line in input_file 
            if pat.search(line)]

print messages

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM