[英]Python lxml: how to fetch XML tag names with xpath selector?
I'm trying to parse the following XML using Python and lxml
: 我正在尝试使用Python和lxml
解析以下XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/bind9.xsl"?>
<isc version="1.0">
<bind>
<statistics version="2.2">
<memory>
<summary>
<TotalUse>1232952256
</TotalUse>
<InUse>835252452
</InUse>
<BlockSize>598212608
</BlockSize>
<ContextSize>52670016
</ContextSize>
<Lost>0
</Lost>
</summary>
</memory>
</statistics>
</bind>
</isc>
The goal is to extract the tag name and text of every element under bind/statistics/memory/summary
in order to produce the following mapping: 目的是提取bind/statistics/memory/summary
下每个元素的标签名称和文本,以产生以下映射:
TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0
I've managed to extract the element values, but I can't figure out the xpath expression to get the element tag names. 我已经设法提取了元素值,但是我无法弄清楚xpath表达式来获取元素标签名称。
A sample script: 示例脚本:
from lxml import etree as et
def main():
xmlfile = "bind982.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "??????" ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)[0]
value = stat.xpath(value_selector)[0]
print "{0}: {1}".format(label, value)
if __name__ == '__main__':
main()
I know I could use value = stat.tag
instead of stat.xpath()
, but the script must be sufficiently generic to also process other pieces of XML where the label selector is different. 我知道我可以使用value = stat.tag
而不是stat.xpath()
,但是脚本必须足够通用才能处理标签选择器不同的其他XML片段。
What xpath selector would return an element's tag name? 哪个xpath选择器将返回元素的标签名称?
Simply use XPath's name()
, and remove the zero index since this returns a string and not list. 只需使用XPath的name()
并删除零索引,因为这将返回一个字符串而不是列表。
from lxml import etree as et
def main():
xmlfile = "ExtractXPathTagName.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "name()" ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)
value = stat.xpath(value_selector)[0]
print("{0}: {1}".format(label, value).strip())
if __name__ == '__main__':
main()
Output 产量
TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0
I think you don't need XPath for the two values, the element nodes have properties tag
and text
so use for instance a list comprehension: 我认为您不需要两个值的XPath,元素节点具有属性tag
和text
因此例如使用列表理解:
[(element.tag, element.text) for element in etree.xpath(location)]
Or if you really want to use XPath 或者,如果您真的想使用XPath
result = [(element.xpath('name()'), element.xpath('string()')) for element in etree.xpath(location)]
You could of course also construct a list of dictionaries: 您当然也可以构造一个词典列表:
result = [{ element.tag : element.text } for element in root.xpath(location)]
or 要么
result = [{ element.xpath('name()') : element.xpath('string()') } for element in etree.xpath(location)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.