I am new to python and have never used regex and I am being asked to used it in a project. My input file uses the following style:
tag <itemname> {
<subitem>
<subitem> -> possible relationship
<~subitem> -> this is all irrelevant
<more subitems>
}
repeating over and over with different tags and varying data of differing lengths. I need to convert this to json, and using unit tests I have already figured out how to reliably do this given I have ONE of these, but I cannot figure out how to reliably parse a file with thousands of the structure above one 'tag' at a time.
Basically, I'm trying to find out how I can read that first line (itemname) and everything between the following two curly braces repetitively from the file and ideally get it into an iterable form that I can work with. Could anyone offer me some advice?
If you've a string like so-
tag <itemname> {
<subitem>
<subitem> -> possible relationship
<~subitem> -> this is all irrelevant
<more subitems>
}
tag <itemname> {
<subitem>
<subitem> -> possible relationship
<~subitem> -> this is all irrelevant
<more subitems>
<more subitems>
}
(and possibly many more tags)
And you simply want a list of each tag.
You can use - (tag.+ {\n(?:.+\n)*?})
Check out the demo here
Your code would look like-
s = """tag <itemname> {
<subitem>
<subitem> -> possible relationship
<~subitem> -> this is all irrelevant
<more subitems>
}
tag <itemname> {
<subitem>
<subitem> -> possible relationship
<~subitem> -> this is all irrelevant
<more subitems>
<more subitems>
}
"""
tags = re.findall(r'(tag .+ {\n(?:.+\n)*?})', s)
# Just to test out the tags
for tag in tags:
print(tag)
Now you can run your own parsing on each tag.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.