简体   繁体   English

使用 lxml 提取 xml 标记值,使用 Python 提取 xpath

[英]Extract xml tag value using lxml and xpath using Python

I have a requirement where I have to extract XML data using lxml and xpath.我有一个要求,我必须使用 lxml 和 xpath 提取 XML 数据。 I need to extract EventId = 122157660 using lxml and xpath.我需要使用 lxml 和 xpath 提取 EventId = 122157660。

<B2B_DATA>
   <B2B_METADATA>
       <EventId>122157660</EventId>
       <MessageType>Request</MessageType>
   </B2B_METADATA>
<PAYLOAD>
    <![CDATA[<?xml version="1.0"?>
        <REQUEST_GROUP MISMOVersionID="1.1.1">
            <REQUESTING_PARTY _Name="CityBank" _StreetAddress="801 Main St" _City="rockwall" _State="MD" _PostalCode="11311" _Identifier="416">
                <CONTACT_DETAIL _Name="XX Davis">
                    <CONTACT_POINT _Type="Phone" _Value="1236573348"/>
                    <CONTACT_POINT _Type="Email" _Value="jXX@city.com"/>
                </CONTACT_DETAIL>
            </REQUESTING_PARTY>
        </REQUEST_GROUP>]]>
</PAYLOAD>
</B2B_DATA>

I am able to do this using loops and iter but would like use xpath for cleaner/shorter code.我可以使用循环和迭代器来做到这一点,但想使用 xpath 来获得更简洁/更短的代码。 Also I am using lxml using to parse CDATA, so trying to avoid ElementTree lib.我也使用lxml来解析CDATA,所以尽量避免使用ElementTree lib。

This is what I tried -这就是我尝试过的 -

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

root = tree.getroot()

 

for neighbor in root.iter('B2B_METADATA'):

    for element in neighbor:

        if element.tag == 'EventId':

            print(element.text)

requested O/P: EventId 122157660请求的 O/P:EventId 122157660

Actually for very simple queries, built-in etree supports limited XPath :实际上对于非常简单的查询,内置的 etree 支持有限的 XPath

print(root.findall('.//B2B_METADATA/EventId')[0].text)

Similar to lxml's xpath :类似于 lxml 的xpath

print(root.xpath('//B2B_METADATA/EventId')[0].text)

Or by parsed objects:或通过解析的对象:

print(root.find('B2B_METADATA').find('EventId').text)

To move your iterators down into XPath, you could use something like this:要将您的迭代器向下移动到 XPath,您可以使用如下内容:

result = tree.xpath('/B2B_DATA/B2B_METADATA/EventId/text()')

That would return a string representation of the text node contained in the EventId element (nested in a B2B_METADATA element, nested in a B2B_DATA element) in your XML, ie 122157660 .这将返回 XML 中的EventId元素(嵌套在B2B_METADATA元素中,嵌套在B2B_DATA元素中)中包含的文本节点的字符串表示形式,即122157660 If there were multiple such text nodes in the XML then the xpath method will return them all as a list of strings.如果 XML 中有多个这样的文本节点,那么xpath方法会将它们全部作为字符串列表返回。

If you knew that EventId only ever appears inside /B2B_DATA/B2B_METADATA then you could shorten your XPath to //EventId/text() .如果您知道EventId只出现在/B2B_DATA/B2B_METADATA ,那么您可以将 XPath 缩短为//EventId/text() It would be computationally less efficient, because the // would search the entire document for EventId elements, but you may value conciseness over efficiency, especially if the XML document is really small (like your sample)它的计算效率会降低,因为//会在整个文档中搜索EventId元素,但是您可能会重视简洁而不是效率,特别是如果 XML 文档非常小(如您的示例)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM