[英]Python: Loop through all nested key-value pairs created by xmltodict
Getting a specific value based on the layout of an xml-file is pretty straight forward.根据 xml 文件的布局获取特定值非常简单。 (See: StackOverflow )
(参见: StackOverflow )
But when I don't know the xml-elements, I can't recurse over it.但是当我不知道 xml 元素时,我不能递归它。 Since xmltodoc nests OrderedDicts in OrderedDicts.
由于 xmltodoc 将 OrderedDicts 嵌套在 OrderedDicts 中。 These nested OrderedDicts are typified by Python as type: 'unicode'.
这些嵌套的 OrderedDict 由 Python 表示为类型:'unicode'。 And not (still) as OrderedDicts.
而不是(仍然)作为 OrderedDicts。 Therefor looping over like this, doens't work:
因此像这样循环,不起作用:
def myprint(d):
for k, v in d.iteritems():
if isinstance(v, list):
myprint(v)
else:
print "Key :{0}, Value: {1}".format(k, v)
What I basically want is to recursive over the whole xml-file where every key-value pair is shown.我基本上想要的是递归整个 xml 文件,其中显示了每个键值对。 And when a value of a key is another list of key-value pairs, it should recursive into it.
当一个键的值是另一个键值对列表时,它应该递归到其中。
With this xml-file as input:使用此 xml 文件作为输入:
<?xml version="1.0" encoding="utf-8"?>
<session id="2934" name="Valves" docVersion="5.0.1">
<docInfo>
<field name="Employee" isMandotory="True">Jake Roberts</field>
<field name="Section" isOpen="True" isMandotory="False">5</field>
<field name="Location" isOpen="True" isMandotory="False">Munchen</field>
</docInfo>
</session>
and the above listed code, all data under session is added as a value to the key session.和上面列出的代码,会话下的所有数据都作为值添加到密钥会话中。
Example output:示例输出:
Key :session, Value: OrderedDict([(u'@id', u'2934'), (u'@name', u'Valves'), (u'@docVersion', u'5.0.1'), (u'docInfo', OrderedDict([(u'field', [OrderedDict([(u'@name', u'Employee'), (u'@isMandotory', u'True'), ('#text', u'Jake Roberts')]), OrderedDict([(u'@name', u'Section'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'5')]), OrderedDict([(u'@name', u'Location'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'Munchen')])])]))])
And this is obviously not what I want.而这显然不是我想要的。
If you come across a list in the data then you just need to call myprint
on every element of the list:如果你在数据中遇到一个列表,那么你只需要在列表的每个元素上调用
myprint
:
def myprint(d):
if isinstance(d,dict): #check if it's a dict before using .iteritems()
for k, v in d.iteritems():
if isinstance(v, (list,dict)): #check for either list or dict
myprint(v)
else:
print "Key :{0}, Value: {1}".format(k, v)
elif isinstance(d,list): #allow for list input too
for item in d:
myprint(item)
then you will get an output something like:然后你会得到一个类似的输出:
...
Key :@name, Value: Employee
Key :@isMandotory, Value: True
Key :#text, Value: Jake Roberts
Key :@name, Value: Section
Key :@isOpen, Value: True
Key :@isMandotory, Value: False
Key :#text, Value: 5
...
Although I'm not sure how useful this is since you have a lot of duplicate keys like @name
, I'd like to offer a function I created a while ago to traverse nested json
data of nested dict
s and list
s:虽然我不确定这有多大用处,因为你有很多重复的键,比如
@name
,但我想提供一个我之前创建的函数来遍历嵌套的dict
和list
的嵌套json
数据:
def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format):
if isinstance(obj,dict):
it = obj.items()
elif isinstance(obj,list):
it = enumerate(obj)
else:
yield prev_path,obj
return
for k,v in it:
for data in traverse(v, path_repr(prev_path,k), path_repr):
yield data
Then you can traverse the data with:然后你可以遍历数据:
for path,value in traverse(doc):
print("{} = {}".format(path,value))
with the default values for prev_path
and path_repr
it gives output like this:使用
prev_path
和path_repr
的默认值,它提供如下输出:
obj[u'session'][u'@id'] = 2934
obj[u'session'][u'@name'] = Valves
obj[u'session'][u'@docVersion'] = 5.0.1
obj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee
obj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True
obj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts
obj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section
obj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5
obj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location
obj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen
although you can write a function for path_repr
to take the value of prev_path
(determined by recursively calling path_repr
) and the new key, for example a function to take a tuple and add another element on the end means we can get a (tuple of indices : elem) format which is perfect to pass to the dict
constructor虽然你可以写一个函数
path_repr
采取的值prev_path
(通过递归调用确定path_repr
和新的关键,例如一个函数取一个元组,并添加结束方式的另一个元素,我们可以得到指数的(元组) :elem) 格式,非常适合传递给dict
构造函数
def _tuple_concat(tup, idx):
return (*tup, idx)
def flatten_data(obj):
"""converts nested dict and list structure into a flat dictionary with tuple keys
corresponding to the sequence of indices to reach particular element"""
return dict(traverse(obj, (), _tuple_concat))
new_data = flatten_data(obj)
import pprint
pprint.pprint(new_data)
which gives you the data in this dictionary format:它为您提供此字典格式的数据:
{('session', '@docVersion'): '5.0.1',
('session', '@id'): 2934,
('session', '@name'): 'Valves',
('session', 'docInfo', 'field', 0, '#text'): 'Jake Roberts',
('session', 'docInfo', 'field', 0, '@isMandotory'): True,
('session', 'docInfo', 'field', 0, '@name'): 'Employee',
('session', 'docInfo', 'field', 1, '#text'): 5,
('session', 'docInfo', 'field', 1, '@isMandotory'): False,
('session', 'docInfo', 'field', 1, '@isOpen'): True,
('session', 'docInfo', 'field', 1, '@name'): 'Section',
('session', 'docInfo', 'field', 2, '#text'): 'Munchen',
('session', 'docInfo', 'field', 2, '@isMandotory'): False,
('session', 'docInfo', 'field', 2, '@isOpen'): True,
('session', 'docInfo', 'field', 2, '@name'): 'Location'}
I found this particularly useful when dealing with my json data but I'm not really sure what you want to do with your xml.我发现这在处理我的 json 数据时特别有用,但我不确定您想对 xml 做什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.