简体   繁体   中英

Parsing repeating input using regex in python

I am new to python and have never used regex and I am being asked to used it in a project. My input file uses the following style:

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

repeating over and over with different tags and varying data of differing lengths. I need to convert this to json, and using unit tests I have already figured out how to reliably do this given I have ONE of these, but I cannot figure out how to reliably parse a file with thousands of the structure above one 'tag' at a time.

Basically, I'm trying to find out how I can read that first line (itemname) and everything between the following two curly braces repetitively from the file and ideally get it into an iterable form that I can work with. Could anyone offer me some advice?

If you've a string like so-

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}

(and possibly many more tags)

And you simply want a list of each tag.

You can use - (tag.+ {\n(?:.+\n)*?})

Check out the demo here

Your code would look like-

s = """tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}
"""

tags = re.findall(r'(tag .+ {\n(?:.+\n)*?})', s)

# Just to test out the tags
for tag in tags:
    print(tag)

Now you can run your own parsing on each tag.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM