简体   繁体   中英

search files in python based on header and footer patterns

I would like to parse a file which looks like this:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
HEADER
body
body
body
FOOTER
BLABLABLABLA
BLABLABLABLA
BLABLABLABLA

I would like to extract the content that exists between HEADER and FOOTER. The number of lines between each HEADER and FOOTER can vary and so can the content itself I have written the following code to extract this:

   fd=open(file,"r")
    for line in fd:
        if not start_flag:
            match = re.search(r'.*HEADER.*',line)
            if not match:
                continue
            else:
                body=body+line+"\n"
                start_flag=True
        else:
            match_end = re.search(r'.*FOOTER.*',line)
            if not match_end:
                body=body+line+"\n"
                continue
            else:
                body=body+line+"\n\n"
                break
   print body

Is this the best way to go about extracting contents from file using python ? What are the other ways to go about such a problem ?

from itertools import groupby

with open(f, "r") as fin:
    groups = groupby(fin, key=lambda k:k.strip() in ("HEADER", "FOOTER"))
    any(k for k,g in groups)
    content = list(next(groups)[1])
print content

Here is a way using itertools :

from itertools import takewhile, dropwhile

with open("myfile.txt") as f:
    starting_iterator = dropwhile(lambda x: x.strip() != 'HEADER', f)
    next(starting_iterator, None)
    contents = takewhile(lambda x: x.strip() != 'FOOTER', starting_iterator)    
    print list(contents)

Since I got pushback on my comments, I might as well show how I'd do it (no need to build lists in memory--that's what iterators are for:

import itertools as it

def contents(source):
    return it.takewhile(lambda x: "FOOTER" != x.strip(),
        it.islice(
            it.dropwhile(lambda x: "HEADER" != x.strip(), source),
        1, None) )

with open("testfile") as f:
    for line in contents(f):
        # Do your stuff here....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM