简体   繁体   中英

lookup by namespace in lxml

I have an xml file that has elements that look like gnc:account (it's a gnucash accounts file). I want to find all elements with that name.

However, if I do this;

for account in tree.iter('gnc:account'):
    print(account)

I get nothing printed. Instead I have written this ridiculous piece of code:

def n(string):
    pair = string.split(':')
    return '{{{}}}{}'.format(root.nsmap[pair[0]], pair[1])

And now I can do this:

for account in tree.iter(n('gnc:account')):
    print(account)

which works.

Is there a non-ridiculous solution to this problem? I'm not interested in writing out the full URI.

What you have now certainly is too hackish, in my opinion.

Solution with XPath

You could use XPath, and register this namespace URI and prefix:

>>> from io import StringIO
>>> s = """<root xmlns:gnc="www.gnc.com">
... <gnc:account>1</gnc:account>
... <gnc:account>2</gnc:account>
... </root>"""
>>> tree = etree.parse(StringIO(s))

# show that without the prefix, there are no results
>>> tree.xpath("//account")
[]

# with an unregistered prefix, throws an error
>>> tree.xpath("//gnc:account")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 2287, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix

# correct way of registering the namespace
>>> tree.xpath("//gnc:account", namespaces={'gnc': 'www.gnc.com'})
[<Element {www.gnc.com}account at 0x112bdd808>, <Element {www.gnc.com}account at 0x112bdd948>]

Sticking with tree.iter()

If you still would like to call iter() in this fashion, you would need to follow lxml's advice on using namespaces with iter , for instance:

>>> for account in tree.iter('{www.gnc.com}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

And if you absolutely want to avoid writing out the namespace URI or registering the namespace (which I do not think is a valid argument, it is quite easy and more clear), you could also use

>>> for account in tree.iter('{*}account'):
...     print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM