Python LXML：如何使用XPath选择器获取XML标签名称？

Question

I'm trying to parse the following XML using Python and lxml : 我正在尝试使用Python和lxml解析以下XML：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/bind9.xsl"?>
<isc version="1.0">
  <bind>
    <statistics version="2.2">
      <memory>
        <summary>
          <TotalUse>1232952256
          </TotalUse>
          <InUse>835252452
          </InUse>
          <BlockSize>598212608
          </BlockSize>
          <ContextSize>52670016
          </ContextSize>
          <Lost>0
          </Lost>
        </summary>
      </memory>
    </statistics>
  </bind>
</isc>

The goal is to extract the tag name and text of every element under bind/statistics/memory/summary in order to produce the following mapping: 目的是提取bind/statistics/memory/summary下每个元素的标签名称和文本，以产生以下映射：

TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0

I've managed to extract the element values, but I can't figure out the xpath expression to get the element tag names. 我已经设法提取了元素值，但是我无法弄清楚xpath表达式来获取元素标签名称。

A sample script: 示例脚本：

from lxml import etree as et

def main():

    xmlfile = "bind982.xml"
    location = "bind/statistics/memory/summary/*"
    label_selector = "??????" ## what to put here...?
    value_selector = "text()"

    with open(xmlfile, "r") as data:
        xmldata = et.parse(data)

        etree = xmldata.getroot()

        statlist = etree.xpath(location)

        for stat in statlist:
            label = stat.xpath(label_selector)[0]
            value = stat.xpath(value_selector)[0]
            print "{0}: {1}".format(label, value)

if __name__ == '__main__':
    main()

I know I could use value = stat.tag instead of stat.xpath() , but the script must be sufficiently generic to also process other pieces of XML where the label selector is different. 我知道我可以使用value = stat.tag而不是stat.xpath() ，但是脚本必须足够通用才能处理标签选择器不同的其他XML片段。

What xpath selector would return an element's tag name? 哪个xpath选择器将返回元素的标签名称？

Answer 1

Simply use XPath's name() , and remove the zero index since this returns a string and not list. 只需使用XPath的name()并删除零索引，因为这将返回一个字符串而不是列表。

from lxml import etree as et

def main():

    xmlfile = "ExtractXPathTagName.xml"
    location = "bind/statistics/memory/summary/*"
    label_selector = "name()"                         ## what to put here...?
    value_selector = "text()"

    with open(xmlfile, "r") as data:
        xmldata = et.parse(data)

        etree = xmldata.getroot()

        statlist = etree.xpath(location)

        for stat in statlist:
            label = stat.xpath(label_selector)
            value = stat.xpath(value_selector)[0]
            print("{0}: {1}".format(label, value).strip())

if __name__ == '__main__':
    main()

Output 产量

TotalUse: 1232952256    
InUse: 835252452    
BlockSize: 598212608    
ContextSize: 52670016    
Lost: 0

Answer 2

I think you don't need XPath for the two values, the element nodes have properties tag and text so use for instance a list comprehension: 我认为您不需要两个值的XPath，元素节点具有属性tag和text因此例如使用列表理解：

[(element.tag, element.text) for element in etree.xpath(location)]

Or if you really want to use XPath 或者，如果您真的想使用XPath

result = [(element.xpath('name()'), element.xpath('string()')) for element in etree.xpath(location)]

You could of course also construct a list of dictionaries: 您当然也可以构造一个词典列表：

result = [{ element.tag : element.text } for element in root.xpath(location)]

or 要么

result = [{ element.xpath('name()') : element.xpath('string()') } for element in etree.xpath(location)]

Python LXML：如何使用XPath选择器获取XML标签名称？

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-09-06 18:27:37

解决方案2
0 2019-09-06 16:24:17

Python LXML：如何使用XPath选择器获取XML标签名称？

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-09-06 18:27:37

解决方案2 0 2019-09-06 16:24:17

解决方案1
1 已采纳 2019-09-06 18:27:37

解决方案2
0 2019-09-06 16:24:17