简体   繁体   English

在 lxml 中按命名空间查找

[英]lookup by namespace in lxml

I have an xml file that has elements that look like gnc:account (it's a gnucash accounts file).我有一个 xml 文件,其中包含类似于gnc:account元素(它是一个gnucash帐户文件)。 I want to find all elements with that name.我想找到所有具有该名称的元素。

However, if I do this;但是,如果我这样做;

for account in tree.iter('gnc:account'):
    print(account)

I get nothing printed.我什么也没打印。 Instead I have written this ridiculous piece of code:相反,我写了这段荒谬的代码:

def n(string):
    pair = string.split(':')
    return '{{{}}}{}'.format(root.nsmap[pair[0]], pair[1])

And now I can do this:现在我可以这样做:

for account in tree.iter(n('gnc:account')):
    print(account)

which works.哪个有效。

Is there a non-ridiculous solution to this problem?这个问题有一个非可笑的解决方案吗? I'm not interested in writing out the full URI.我对写出完整的 URI 不感兴趣。

What you have now certainly is too hackish, in my opinion.在我看来,你现在所拥有的肯定太黑了。

Solution with XPath XPath 解决方案

You could use XPath, and register this namespace URI and prefix:您可以使用 XPath,并注册此命名空间 URI 和前缀:

>>> from io import StringIO
>>> s = """<root xmlns:gnc="www.gnc.com">
... <gnc:account>1</gnc:account>
... <gnc:account>2</gnc:account>
... </root>"""
>>> tree = etree.parse(StringIO(s))

# show that without the prefix, there are no results
>>> tree.xpath("//account")
[]

# with an unregistered prefix, throws an error
>>> tree.xpath("//gnc:account")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 2287, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix

# correct way of registering the namespace
>>> tree.xpath("//gnc:account", namespaces={'gnc': 'www.gnc.com'})
[<Element {www.gnc.com}account at 0x112bdd808>, <Element {www.gnc.com}account at 0x112bdd948>]

Sticking with tree.iter()坚持使用tree.iter()

If you still would like to call iter() in this fashion, you would need to follow lxml's advice on using namespaces with iter , for instance:如果您仍然想以这种方式调用iter() ,则需要遵循lxml 关于将命名空间与 iter 一起使用的建议,例如:

>>> for account in tree.iter('{www.gnc.com}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

And if you absolutely want to avoid writing out the namespace URI or registering the namespace (which I do not think is a valid argument, it is quite easy and more clear), you could also use如果你绝对想避免写出命名空间 URI 或注册命名空间(我认为这不是一个有效的参数,它很容易也更清晰),你也可以使用

>>> for account in tree.iter('{*}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM