简体   繁体   中英

Best practices - Parsing XML API response - Python 3

I have referenced several guides, but I'm still finding it difficult to wrap my head around this (Python newb):

  • /docs.python.org/3.7/library/xml.etree.elementtree.html
  • /effbot.org/zone/element-xpath.htm

xml output example

The intent is to retrieve the zipcode text value; however, I haven't done this before and from referencing the guides, I want the output of the following xpath:

/SearchResults:searchresults[@xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"]/response/results/result/address/zipcode/text()

Here's an example of what's working from a local file:

from xml.etree import ElementTree as ET

tree = ET.parse(<destination_of_xml>.xml')

for elem in tree.iterfind('/response/results/result/address/zipcode'):
    print(elem.tag, elem.text)
----------------------------------------------------------------------
output: 
zipcode {90292}
zipcode {90292}
...

What's good practice in this instance to retrieve zipcode values and account for any schema changes in the future (ie iterate through XML until finding the element zipcode)? Are there better solutions to this?

You may need to know about xpath expressions.

I'm using the lxml library to parse a simpler xml hierarchy. I don't need to know what's above the zipcode element because I can write an xpath expression that says, in effect, look anywhere from the top of the document for zipcode elements (note, plural): .//zipcode . This yields the element. Now that I have them, since I know there's just one, I select the 'first', get its text and strip off leading and trailing blanks.

Providing that the name of the element remains unchanged ...

>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> tree = etree.fromstring('''\
... <company>
...     <name>XYZ</name>
...     <industry>chemicals</industry>
...     <address>
...         <street>
...             14234 Onyx Drive West
...         </street>
...         <city>
...             Ainslie
...         </city>
...         <state>
...             Idaho
...         </state>
...         <zipcode>
...             87734
...         </zipcode>
...     </address>
... </company>''')
>>> tree.xpath('.//zipcode')
[<Element zipcode at 0xb5e9c8>]

>>> tree.xpath('.//zipcode')[0].text.strip()
'87734'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM