This is an example XML file content, which I have to work with:
<states>
<state name="foo">
<and>
<eq><text value="bar" /></eq>
<or>
<eqnull><text value="bar2" /></eqnull>
<eqnull><text value="bar3" /></eqnull>
</or>
</and>
</state>
</states>
This structure is unpredictable, it can change diametrically in each state. It can, in example, look like this:
<states>
<state name="foo">
<and>
<or>
<eq><text value="bar" /></eq>
<eq><text value="bar2" /></eq>
</or>
<eqnull><selectedText value="bar3" number="1"></eqnull>
</and>
</state>
</states>
Independently from unpredictability of this structure, I want to parse it to a Python list of dictionaries, which will look like this (accordingly to first XML example):
[{'and': {'eq': {'text': {'value': 'bar'}}}},
{'and': {'or': [{'eqnull': {'text': {'value': 'bar2'}}},
{'eqnull': {'text': {'value': 'bar3'}}},]}}]
I was trying to use ElementTree and get content of state structure as a dictionary using:
xmltodict.parse
and then recursively strip this dictionary (key by key) to my list of dictionaries. This solution is very hard for me to implement (unfortunately I'm not a Python developer...) and I am wandering, if there is some easier way to do this.
I have another solution in mind: iterate through each node in XML structure, dinamically build dictionaries and, finally, list of dictionaries. But there is one problem: I do not know, when ie eq node ends. If there were some way to recognize ending node /eq I think it will be manageable.
Or maybe there is some another way in Python which I do not know...
Here is an example of how you could do by recursively adding the content of each node:
def findMarkup(str, mainlist):
markup = re.search('<([^>]*)>', str)
if markup:
markup_content = markup.group(1)
begin = markup.end()
name = markup_content.split(' ')[0]
#we check if the markup ends itself
if markup_content.find('/')!=-1:
end = begin+1
else:
end = str.find('</{0}>'.format(name))
if begin+1<end:
#the node has child, its content is theirs
inner_value = []
findMarkup(str[begin:end], inner_value)
else:
#the content of the current node is its attributes
inner_value = getAttr(markup_content)
#we add the content of the current node
mainlist.append({name:inner_value})
#we iterate on the rest of the string for same level markups
findMarkup(str[end+2:],mainlist)
def getAttr(markup_content):
attr_list = re.finditer('(\w*)="(\w*)"', markup_content)
attr_dict = dict()
for attr in attr_list:
attr_dict[attr.group(1)] = attr.group(2)
return attr_dict
It gave me something like (if I look inside the state content, cause state will be also counted as node)
[{'and': [{'eq': [{'text': {'value': 'bar'}}]}, {'or': [{'eqnull': [{'text': {'value': 'bar2'}}]}, {'eqnull': [{'text': {'value': 'bar3'}}]}]}]}]
It's not exactly how you wanted it but you can still manage to get the info I guess. You just instantiate an empty list and put the xml content in a string and then call once findMarkup(xml_in_string, empty_list), the list will be filled.
Note that I don't really know your end purpose so a simple copy-paste may not be enough, maybe you should refine the part where I create inner_value... Also, this code assumes that the file is perfectly written, you should add exception handling if required.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.