Parsing repeating input using regex in python

Question

I am new to python and have never used regex and I am being asked to used it in a project. My input file uses the following style:

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
}

repeating over and over with different tags and varying data of differing lengths. I need to convert this to json, and using unit tests I have already figured out how to reliably do this given I have ONE of these, but I cannot figure out how to reliably parse a file with thousands of the structure above one 'tag' at a time.

Basically, I'm trying to find out how I can read that first line (itemname) and everything between the following two curly braces repetitively from the file and ideally get it into an iterable form that I can work with. Could anyone offer me some advice?

Answer 1

If you've a string like so-

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}

(and possibly many more tags)

And you simply want a list of each tag.

You can use - (tag.+ {\n(?:.+\n)*?})

Check out the demo here

Your code would look like-

s = """tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant 
    <more subitems> 
} 

tag <itemname> {
    <subitem>
    <subitem> -> possible relationship
    <~subitem> -> this is all irrelevant
    <more subitems>
    <more subitems>
}
"""

tags = re.findall(r'(tag .+ {\n(?:.+\n)*?})', s)

# Just to test out the tags
for tag in tags:
    print(tag)

Now you can run your own parsing on each tag.

Parsing repeating input using regex in python

Question

1 answers

solution1
0 ACCPTED 2020-06-18 14:48:48

Parsing repeating input using regex in python

Question

1 answers

solution1 0 ACCPTED 2020-06-18 14:48:48

solution1
0 ACCPTED 2020-06-18 14:48:48