简体   繁体   中英

Python ElementTree unable to parse xml file correctly

I am trying to Parse an XML file using elemenTree of Python. The xml file is like below:

<App xmlns="test attribute">
    <name>sagar</name>
</App>

Parser Code:

from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
    eleTree = etree.parse('app.xml')
    eleRoot = eleTree.getroot()
    print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
    parser()

Output:

[sagar@linux Parser]$ python test.py
Tag:{test attribute}App  <------------- It should print only "App"
Attrib:{}

When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value. Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?

Your xml uses the attribute xmlns , which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute" is invalid, which appears to be troubling the parsing of your xml by etree .

For more information on xml namespaces see XML Namespaces at W3 Schools.


Edit:

After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree has the form {namespace_url}tag_name . This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App , which is what you're getting out of your program.

Source

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM