Python：循环遍历由 xmltodict 创建的所有嵌套键值对

Question

Getting a specific value based on the layout of an xml-file is pretty straight forward.根据 xml 文件的布局获取特定值非常简单。 (See: StackOverflow ) （参见： StackOverflow ）

But when I don't know the xml-elements, I can't recurse over it.但是当我不知道 xml 元素时，我不能递归它。 Since xmltodoc nests OrderedDicts in OrderedDicts.由于 xmltodoc 将 OrderedDicts 嵌套在 OrderedDicts 中。 These nested OrderedDicts are typified by Python as type: 'unicode'.这些嵌套的 OrderedDict 由 Python 表示为类型：'unicode'。 And not (still) as OrderedDicts.而不是（仍然）作为 OrderedDicts。 Therefor looping over like this, doens't work:因此像这样循环，不起作用：

def myprint(d):
    for k, v in d.iteritems():
        if isinstance(v, list):
            myprint(v)
        else:
            print "Key :{0},  Value: {1}".format(k, v)

What I basically want is to recursive over the whole xml-file where every key-value pair is shown.我基本上想要的是递归整个 xml 文件，其中显示了每个键值对。 And when a value of a key is another list of key-value pairs, it should recursive into it.当一个键的值是另一个键值对列表时，它应该递归到其中。

With this xml-file as input:使用此 xml 文件作为输入：

<?xml version="1.0" encoding="utf-8"?>
<session id="2934" name="Valves" docVersion="5.0.1">
    <docInfo>
        <field name="Employee" isMandotory="True">Jake Roberts</field>
        <field name="Section" isOpen="True" isMandotory="False">5</field>
        <field name="Location" isOpen="True" isMandotory="False">Munchen</field>
    </docInfo>
</session>

and the above listed code, all data under session is added as a value to the key session.和上面列出的代码，会话下的所有数据都作为值添加到密钥会话中。

Example output:示例输出：

Key :session,  Value: OrderedDict([(u'@id', u'2934'), (u'@name', u'Valves'), (u'@docVersion', u'5.0.1'), (u'docInfo', OrderedDict([(u'field', [OrderedDict([(u'@name', u'Employee'), (u'@isMandotory', u'True'), ('#text', u'Jake Roberts')]), OrderedDict([(u'@name', u'Section'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'5')]), OrderedDict([(u'@name', u'Location'), (u'@isOpen', u'True'), (u'@isMandotory', u'False'), ('#text', u'Munchen')])])]))])

And this is obviously not what I want.而这显然不是我想要的。

Answer 1

If you come across a list in the data then you just need to call myprint on every element of the list:如果你在数据中遇到一个列表，那么你只需要在列表的每个元素上调用myprint ：

def myprint(d):
    if isinstance(d,dict): #check if it's a dict before using .iteritems()
        for k, v in d.iteritems():
            if isinstance(v, (list,dict)): #check for either list or dict
                myprint(v)
            else:
                print "Key :{0},  Value: {1}".format(k, v)
    elif isinstance(d,list): #allow for list input too
        for item in d:
            myprint(item)

then you will get an output something like:然后你会得到一个类似的输出：

...
Key :@name,  Value: Employee
Key :@isMandotory,  Value: True
Key :#text,  Value: Jake Roberts
Key :@name,  Value: Section
Key :@isOpen,  Value: True
Key :@isMandotory,  Value: False
Key :#text,  Value: 5
...

Although I'm not sure how useful this is since you have a lot of duplicate keys like @name , I'd like to offer a function I created a while ago to traverse nested json data of nested dict s and list s:虽然我不确定这有多大用处，因为你有很多重复的键，比如@name ，但我想提供一个我之前创建的函数来遍历嵌套的dict和list的嵌套json数据：

def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format):
    if isinstance(obj,dict):
        it = obj.items()
    elif isinstance(obj,list):
        it = enumerate(obj)
    else:
        yield prev_path,obj
        return
    for k,v in it:
        for data in traverse(v, path_repr(prev_path,k), path_repr):
            yield data

Then you can traverse the data with:然后你可以遍历数据：

for path,value in traverse(doc):
    print("{} = {}".format(path,value))

with the default values for prev_path and path_repr it gives output like this:使用prev_path和path_repr的默认值，它提供如下输出：

obj[u'session'][u'@id'] = 2934
obj[u'session'][u'@name'] = Valves
obj[u'session'][u'@docVersion'] = 5.0.1
obj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee
obj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True
obj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts
obj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section
obj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5
obj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location
obj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True
obj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False
obj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen

although you can write a function for path_repr to take the value of prev_path (determined by recursively calling path_repr ) and the new key, for example a function to take a tuple and add another element on the end means we can get a (tuple of indices : elem) format which is perfect to pass to the dict constructor虽然你可以写一个函数path_repr采取的值prev_path （通过递归调用确定path_repr和新的关键，例如一个函数取一个元组，并添加结束方式的另一个元素，我们可以得到指数的（元组） :elem) 格式，非常适合传递给dict构造函数

def _tuple_concat(tup, idx):
    return (*tup, idx)   
def flatten_data(obj):
    """converts nested dict and list structure into a flat dictionary with tuple keys
    corresponding to the sequence of indices to reach particular element"""
    return dict(traverse(obj, (), _tuple_concat))

new_data = flatten_data(obj)
import pprint
pprint.pprint(new_data)

which gives you the data in this dictionary format:它为您提供此字典格式的数据：

{('session', '@docVersion'): '5.0.1',
 ('session', '@id'): 2934,
 ('session', '@name'): 'Valves',
 ('session', 'docInfo', 'field', 0, '#text'): 'Jake Roberts',
 ('session', 'docInfo', 'field', 0, '@isMandotory'): True,
 ('session', 'docInfo', 'field', 0, '@name'): 'Employee',
 ('session', 'docInfo', 'field', 1, '#text'): 5,
 ('session', 'docInfo', 'field', 1, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 1, '@isOpen'): True,
 ('session', 'docInfo', 'field', 1, '@name'): 'Section',
 ('session', 'docInfo', 'field', 2, '#text'): 'Munchen',
 ('session', 'docInfo', 'field', 2, '@isMandotory'): False,
 ('session', 'docInfo', 'field', 2, '@isOpen'): True,
 ('session', 'docInfo', 'field', 2, '@name'): 'Location'}

I found this particularly useful when dealing with my json data but I'm not really sure what you want to do with your xml.我发现这在处理我的 json 数据时特别有用，但我不确定您想对 xml 做什么。

Python：循环遍历由 xmltodict 创建的所有嵌套键值对

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-04-12 19:20:23

Python：循环遍历由 xmltodict 创建的所有嵌套键值对

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-04-12 19:20:23

解决方案1
3 已采纳 2016-04-12 19:20:23