简体   繁体   English

最佳做法-解析XML API响应-Python 3

[英]Best practices - Parsing XML API response - Python 3

I have referenced several guides, but I'm still finding it difficult to wrap my head around this (Python newb): 我已经参考了几本指南,但是仍然很难克服(Python newb):

  • /docs.python.org/3.7/library/xml.etree.elementtree.html /docs.python.org/3.7/library/xml.etree.elementtree.html
  • /effbot.org/zone/element-xpath.htm /effbot.org/zone/element-xpath.htm

xml output example xml输出示例

The intent is to retrieve the zipcode text value; 目的是检索邮政编码文本值。 however, I haven't done this before and from referencing the guides, I want the output of the following xpath: 但是,在引用指南之前,我还没有这样做,我想要以下xpath的输出:

/SearchResults:searchresults[@xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"]/response/results/result/address/zipcode/text()

Here's an example of what's working from a local file: 这是本地文件工作示例:

from xml.etree import ElementTree as ET

tree = ET.parse(<destination_of_xml>.xml')

for elem in tree.iterfind('/response/results/result/address/zipcode'):
    print(elem.tag, elem.text)
----------------------------------------------------------------------
output: 
zipcode {90292}
zipcode {90292}
...

What's good practice in this instance to retrieve zipcode values and account for any schema changes in the future (ie iterate through XML until finding the element zipcode)? 在这种情况下,如何检索邮政编码值并考虑将来的任何模式更改(即遍历XML直到找到元素邮政编码),是什么好习惯? Are there better solutions to this? 有更好的解决方案吗?

You may need to know about xpath expressions. 您可能需要了解xpath表达式。

I'm using the lxml library to parse a simpler xml hierarchy. 我正在使用lxml库来解析更简单的xml层次结构。 I don't need to know what's above the zipcode element because I can write an xpath expression that says, in effect, look anywhere from the top of the document for zipcode elements (note, plural): .//zipcode . 我不需要知道zipcode元素上方的内容,因为我可以编写一个xpath表达式,说实际上是在文档顶部的任何地方查找zipcode元素(注意,复数): .//zipcode This yields the element. 这产生了元素。 Now that I have them, since I know there's just one, I select the 'first', get its text and strip off leading and trailing blanks. 现在有了它们,因为我知道只有一个,所以我选择“第一个”,获取其text并去除开头和结尾的空格。

Providing that the name of the element remains unchanged ... 假设元素名称保持不变...

>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> tree = etree.fromstring('''\
... <company>
...     <name>XYZ</name>
...     <industry>chemicals</industry>
...     <address>
...         <street>
...             14234 Onyx Drive West
...         </street>
...         <city>
...             Ainslie
...         </city>
...         <state>
...             Idaho
...         </state>
...         <zipcode>
...             87734
...         </zipcode>
...     </address>
... </company>''')
>>> tree.xpath('.//zipcode')
[<Element zipcode at 0xb5e9c8>]

>>> tree.xpath('.//zipcode')[0].text.strip()
'87734'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM