[英]Best practices - Parsing XML API response - Python 3
I have referenced several guides, but I'm still finding it difficult to wrap my head around this (Python newb): 我已经参考了几本指南,但是仍然很难克服(Python newb):
The intent is to retrieve the zipcode text value; 目的是检索邮政编码文本值。 however, I haven't done this before and from referencing the guides, I want the output of the following xpath: 但是,在引用指南之前,我还没有这样做,我想要以下xpath的输出:
/SearchResults:searchresults[@xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"]/response/results/result/address/zipcode/text()
Here's an example of what's working from a local file: 这是本地文件工作示例:
from xml.etree import ElementTree as ET
tree = ET.parse(<destination_of_xml>.xml')
for elem in tree.iterfind('/response/results/result/address/zipcode'):
print(elem.tag, elem.text)
----------------------------------------------------------------------
output:
zipcode {90292}
zipcode {90292}
...
What's good practice in this instance to retrieve zipcode values and account for any schema changes in the future (ie iterate through XML until finding the element zipcode)? 在这种情况下,如何检索邮政编码值并考虑将来的任何模式更改(即遍历XML直到找到元素邮政编码),是什么好习惯? Are there better solutions to this? 有更好的解决方案吗?
You may need to know about xpath expressions. 您可能需要了解xpath表达式。
I'm using the lxml library to parse a simpler xml hierarchy. 我正在使用lxml库来解析更简单的xml层次结构。 I don't need to know what's above the zipcode
element because I can write an xpath expression that says, in effect, look anywhere from the top of the document for zipcode
elements (note, plural): .//zipcode
. 我不需要知道zipcode
元素上方的内容,因为我可以编写一个xpath表达式,说实际上是在文档顶部的任何地方查找zipcode
元素(注意,复数): .//zipcode
。 This yields the element. 这产生了元素。 Now that I have them, since I know there's just one, I select the 'first', get its text
and strip off leading and trailing blanks. 现在有了它们,因为我知道只有一个,所以我选择“第一个”,获取其text
并去除开头和结尾的空格。
Providing that the name of the element remains unchanged ... 假设元素名称保持不变...
>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> tree = etree.fromstring('''\
... <company>
... <name>XYZ</name>
... <industry>chemicals</industry>
... <address>
... <street>
... 14234 Onyx Drive West
... </street>
... <city>
... Ainslie
... </city>
... <state>
... Idaho
... </state>
... <zipcode>
... 87734
... </zipcode>
... </address>
... </company>''')
>>> tree.xpath('.//zipcode')
[<Element zipcode at 0xb5e9c8>]
>>> tree.xpath('.//zipcode')[0].text.strip()
'87734'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.