Getting a specific value based on the layout of an xml-file is pretty straight forward. (See: StackOverflow )
But when I don't know the xml-elements, I can't recurse over it. Since xmltodoc nests OrderedDicts in OrderedDicts. These nested OrderedDicts are typified by Python as type: 'unicode'. And not (still) as OrderedDicts. Therefor looping over like this, doens't work:
def myprint(d):
for k, v in d.iteritems():
if isinstance(v, list):
myprint(v)
else:
print "Key :{0}, Value: {1}".format(k, v)
What I basically want is to recursive over the whole xml-file where every key-value pair is shown. And when a value of a key is another list of key-value pairs, it should recursive into it.
With this xml-file as input:
<?xml version="1.0" encoding="utf-8"?>
<session id="2934" name="Valves" docVersion="5.0.1">
<docInfo>
<field name="Employee" isMandotory="True">Jake Roberts</field>
<field name="Section" isOpen="True" isMandotory="False">5</field>
<field name="Location" isOpen="True" isMandotory="False">Munchen</field>
</docInfo>
</session>
and the above listed code, all data under session is added as a value to the key session.
Example output:
Key :session, Value: OrderedDict([(u'@id', u'2934'), (u'@name', u'Valves'), (u'@docVersion', u'5.0.1'), (u'docInfo', OrderedDict([(u'field', [OrderedDict([(u'@name', u'Employee'), (u'@isMandotory', u'True'), ('#text', u'Jake Roberts')]), OrderedDict([(u'@name', u'Section'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'5')]), OrderedDict([(u'@name', u'Location'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'Munchen')])])]))])
And this is obviously not what I want.
If you come across a list in the data then you just need to call myprint
on every element of the list:
def myprint(d):
if isinstance(d,dict): #check if it's a dict before using .iteritems()
for k, v in d.iteritems():
if isinstance(v, (list,dict)): #check for either list or dict
myprint(v)
else:
print "Key :{0}, Value: {1}".format(k, v)
elif isinstance(d,list): #allow for list input too
for item in d:
myprint(item)
then you will get an output something like:
...
Key :@name, Value: Employee
Key :@isMandotory, Value: True
Key :#text, Value: Jake Roberts
Key :@name, Value: Section
Key :@isOpen, Value: True
Key :@isMandotory, Value: False
Key :#text, Value: 5
...
Although I'm not sure how useful this is since you have a lot of duplicate keys like @name
, I'd like to offer a function I created a while ago to traverse nested json
data of nested dict
s and list
s:
def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format):
if isinstance(obj,dict):
it = obj.items()
elif isinstance(obj,list):
it = enumerate(obj)
else:
yield prev_path,obj
return
for k,v in it:
for data in traverse(v, path_repr(prev_path,k), path_repr):
yield data
Then you can traverse the data with:
for path,value in traverse(doc):
print("{} = {}".format(path,value))
with the default values for prev_path
and path_repr
it gives output like this:
obj[u'session'][u'@id'] = 2934
obj[u'session'][u'@name'] = Valves
obj[u'session'][u'@docVersion'] = 5.0.1
obj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee
obj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True
obj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts
obj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section
obj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5
obj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location
obj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen
although you can write a function for path_repr
to take the value of prev_path
(determined by recursively calling path_repr
) and the new key, for example a function to take a tuple and add another element on the end means we can get a (tuple of indices : elem) format which is perfect to pass to the dict
constructor
def _tuple_concat(tup, idx):
return (*tup, idx)
def flatten_data(obj):
"""converts nested dict and list structure into a flat dictionary with tuple keys
corresponding to the sequence of indices to reach particular element"""
return dict(traverse(obj, (), _tuple_concat))
new_data = flatten_data(obj)
import pprint
pprint.pprint(new_data)
which gives you the data in this dictionary format:
{('session', '@docVersion'): '5.0.1',
('session', '@id'): 2934,
('session', '@name'): 'Valves',
('session', 'docInfo', 'field', 0, '#text'): 'Jake Roberts',
('session', 'docInfo', 'field', 0, '@isMandotory'): True,
('session', 'docInfo', 'field', 0, '@name'): 'Employee',
('session', 'docInfo', 'field', 1, '#text'): 5,
('session', 'docInfo', 'field', 1, '@isMandotory'): False,
('session', 'docInfo', 'field', 1, '@isOpen'): True,
('session', 'docInfo', 'field', 1, '@name'): 'Section',
('session', 'docInfo', 'field', 2, '#text'): 'Munchen',
('session', 'docInfo', 'field', 2, '@isMandotory'): False,
('session', 'docInfo', 'field', 2, '@isOpen'): True,
('session', 'docInfo', 'field', 2, '@name'): 'Location'}
I found this particularly useful when dealing with my json data but I'm not really sure what you want to do with your xml.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.