在 lxml 中按命名空间查找

Question

I have an xml file that has elements that look like gnc:account (it's a gnucash accounts file).我有一个 xml 文件，其中包含类似于gnc:account元素（它是一个gnucash帐户文件）。 I want to find all elements with that name.我想找到所有具有该名称的元素。

However, if I do this;但是，如果我这样做；

for account in tree.iter('gnc:account'):
    print(account)

I get nothing printed.我什么也没打印。 Instead I have written this ridiculous piece of code:相反，我写了这段荒谬的代码：

def n(string):
    pair = string.split(':')
    return '{{{}}}{}'.format(root.nsmap[pair[0]], pair[1])

And now I can do this:现在我可以这样做：

for account in tree.iter(n('gnc:account')):
    print(account)

which works.哪个有效。

Is there a non-ridiculous solution to this problem?这个问题有一个非可笑的解决方案吗？ I'm not interested in writing out the full URI.我对写出完整的 URI 不感兴趣。

Answer 1

What you have now certainly is too hackish, in my opinion.在我看来，你现在所拥有的肯定太黑了。

Solution with XPath XPath 解决方案

You could use XPath, and register this namespace URI and prefix:您可以使用 XPath，并注册此命名空间 URI 和前缀：

>>> from io import StringIO
>>> s = """<root xmlns:gnc="www.gnc.com">
... <gnc:account>1</gnc:account>
... <gnc:account>2</gnc:account>
... </root>"""
>>> tree = etree.parse(StringIO(s))

# show that without the prefix, there are no results
>>> tree.xpath("//account")
[]

# with an unregistered prefix, throws an error
>>> tree.xpath("//gnc:account")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 2287, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix

# correct way of registering the namespace
>>> tree.xpath("//gnc:account", namespaces={'gnc': 'www.gnc.com'})
[<Element {www.gnc.com}account at 0x112bdd808>, <Element {www.gnc.com}account at 0x112bdd948>]

Sticking with tree.iter()坚持使用tree.iter()

If you still would like to call iter() in this fashion, you would need to follow lxml's advice on using namespaces with iter , for instance:如果您仍然想以这种方式调用iter() ，则需要遵循lxml 关于将命名空间与 iter 一起使用的建议，例如：

>>> for account in tree.iter('{www.gnc.com}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

And if you absolutely want to avoid writing out the namespace URI or registering the namespace (which I do not think is a valid argument, it is quite easy and more clear), you could also use如果你绝对想避免写出命名空间 URI 或注册命名空间（我认为这不是一个有效的参数，它很容易也更清晰），你也可以使用

>>> for account in tree.iter('{*}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

在 lxml 中按命名空间查找

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-02-28 10:21:20

在 lxml 中按命名空间查找

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-02-28 10:21:20

解决方案1
2 已采纳 2020-02-28 10:21:20