简体   繁体   中英

Trying to parse an XML file with Python - what am I doing wrong?

I'm working with XML and Python for the first time. The ultimate goal is to send a request to a REST service, receive a response in XML, and parse the values and send emails depending on what was returned. However, the REST service is not yet in place, so for now I'm experimenting with an XML file saved on my C drive.

I have a simple bit of code, and I'm confused about why it isn't working.

This is my xml file ("XMLTest.xml"):

<Response>
    <exitCode>1</exitCode>
    <fileName>C:/Something/</fileName>
    <errors>
        <error>Error generating report</error>
    </errors>
</Response>

This is my code so far:

from xml.dom import minidom

something = open("C:/XMLTest.xml")
something = minidom.parse(something)

nodeList = []
for node in something.getElementsByTagName("Response"):  
    nodeList.extend(t.nodeValue for t in node.childNodes)
print nodeList

But the results that print out are...

[u'\n\t', None, u'\n\t', None, u'\n\t', None, u'\n']

What am I doing wrong?

I'm trying to get the node values. Is there a better way to do this? Is there a built-in method in Python to convert an xml file to an object or dictionary? I'd like to get all the values, preferably with the names attached.

Does this help?

doc = '''<Response>
    <exitCode>1</exitCode>
    <fileName>C:/Something/</fileName>
    <errors>
        <error>Error generating report</error>
    </errors>
</Response>'''

from xml.dom import minidom

something = minidom.parseString( doc )

nodeList = [ ]
for node in something.getElementsByTagName( "Response" ):
    response = { }
    response[ "exit code" ] = node.getElementsByTagName( "exitCode" )[ 0 ].childNodes[ 0 ].nodeValue
    response[ "file name" ] = node.getElementsByTagName( "fileName" )[ 0 ].childNodes[ 0 ].nodeValue
    errors = node.getElementsByTagName( "errors" )[ 0 ].getElementsByTagName( "error" )
    response[ "errors" ] = [ error.childNodes[ 0 ].nodeValue for error in errors ]

    nodeList.append( response )

import pprint
pprint.pprint( nodeList )

yields

[{'errors': [u'Error generating report'],
  'exit code': u'1',
  'file name': u'C:/Something/'}]

If you are just starting out with xml and python, and have no compelling reason to use DOM, I strongly suggest you have a look at the ElementTree api (implemented in the standard library in xml.etree.ElementTree )

To give you a taste:

import xml.etree.cElementTree as etree

tree = etree.parse('C:/XMLTest.xml')
response = tree.getroot()
exitcode = response.find('exitCode').text
filename = response.find('fileName').text
errors = [i.text for i in response.find('errors')]

(if you need more power - xpath, validation, xslt etc... - you can even switch to lxml , which implements the same API, but with a lot of extras)

You're not thinking about the XML from the DOM standpoint. Namely, 'C:/Something' isn't the nodevalue of the element whose tagname is 'fileName'; it's the nodevalue of the text node that is the first child of the element whose tagname is 'fileName'.

What I recommend you do is play around with it a little more in python itself: start python.

from xml.dom import minidom

x = minidom.parseString('<Response><filename>C:/</filename>>')

x.getElementsByTagName('Response') ... x.getElementsByTagName('Response')[0].childNodes[0] ...

and so forth. You'll get a quick appreciation for how the document is being parsed.

I recommend my library xml2obj . It is way cleaner than DOM. The "library" has only 84 lines of code you can embed anywhere.

In [185]: resp = xml2obj(something)

In [186]: resp.exitCode
Out[186]: u'1'

In [187]: resp.fileName
Out[187]: u'C:/Something/'

In [188]: len(resp.errors)
Out[188]: 1

In [189]: for node in resp.errors:
   .....:     print node.error
   .....:
   .....:
Error generating report

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM