使用Python / lxml和XPath检索属性名称和值

Question

我将XPath与Python lxml（Python 2）结合使用。 我对数据进行了两次遍历，一次遍历选择了感兴趣的记录，一次遍历从数据中提取值。 这是代码类型的示例。

from lxml import etree

xml = """
  <records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" />
    <row id="3" height="140" />
  </records>
"""

parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("@id|@height|@weight")

当我运行此脚本时，输出为：

['1', '160', '80']
['2', '70']
['3', '140']

从结果中可以看到，缺少属性的地方，其他属性的位置发生了变化，因此我无法在第2行和第3行中分辨出这是身高还是体重。

有没有办法获取从etree / lxml返回的属性的名称？ 理想情况下，我应该以以下格式查看结果：

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

我认识到我可以使用elementtree和Python解决此特定情况。 但是，我希望使用XPath（以及相对简单的XPath）解决此问题，而不是使用python处理数据。

Answer 1

您应该尝试执行以下操作：

for node in nodes:
    print node.attrib

这将返回节点所有属性的字典，格式为{'id': '1', 'weight': '80', 'height': '160'}

如果您想获得类似[('@id', '1'), ('@height', '160'), ('@weight', '80')] ：

list_of_attributes = []
for node in nodes:
    attrs = []
    for att in node.attrib:
        attrs.append(("@" + att, node.attrib[att]))
    list_of_attributes.append(attrs)

输出：

[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]

Answer 2

我不打算使用Python的主张是错误的。 我发现lxml / etree实现很容易扩展为可以修改使用XPath DSL。

我注册了功能“ dictify”。 我将XPath表达式更改为：

dictify('@id|@height|@weight|weight|height')

新的代码是：

from lxml import etree

xml = """
<records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" ><height>150</height></row>
    <row id="3" height="140" />
</records>
"""

def dictify(context, names):
    node = context.context_node
    rv = []
    rv.append('__dictify_start_marker__')
    names = names.split('|')
    for n in names:
        if n.startswith('@'):
            val =  node.attrib.get(n[1:])
            if val != None:
                rv.append(n)
                rv.append(val)
        else:
            children = node.findall(n)
            for child_node in children:
                rv.append(n)
                rv.append(child_node.text)
    rv.append('__dictify_end_marker__')
    return rv

etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify


parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

这将产生以下输出：

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

使用Python / lxml和XPath检索属性名称和值

问题描述

2 个解决方案

解决方案1
5 2017-02-23 10:57:11

解决方案2
1 已采纳 2017-02-23 18:17:14

使用Python / lxml和XPath检索属性名称和值

问题描述

2 个解决方案

解决方案1 5 2017-02-23 10:57:11

解决方案2 1 已采纳 2017-02-23 18:17:14

解决方案1
5 2017-02-23 10:57:11

解决方案2
1 已采纳 2017-02-23 18:17:14