简体   繁体   English

使用 Python lxml 解析带有命名空间的 XML 文档时出现问题

[英]Problem parsing XML document with namespaces using Python lxml

Using Python lxml library, I'm trying to parse a XML document as follows:使用 Python lxml 库,我正在尝试解析 XML 文档,如下所示:

<ns:searchByScientificNameResponse xmlns:ns="http://itis_service.itis.usgs.gov">
<ns:return xmlns:ax21="http://data.itis_service.itis.usgs.gov/xsd" xmlns:ax23="http://metadata.itis_service.itis.usgs.gov/xsd" xmlns:ax26="http://itis_service.itis.usgs.gov/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ax21:SvcScientificNameList">
<ax21:scientificNames xsi:type="ax21:SvcScientificName">
<ax21:tsn>26339</ax21:tsn>
<ax21:author>L.</ax21:author>
<ax21:combinedName>Vicia faba</ax21:combinedName>
<ax21:kingdom>Plantae</ax21:kingdom>
<ax21:unitInd1 xsi:nil="true" />
<ax21:unitInd2 xsi:nil="true" />
<ax21:unitInd3 xsi:nil="true" />
<ax21:unitInd4 xsi:nil="true" />
<ax21:unitName1>Vicia</ax21:unitName1>
<ax21:unitName2>faba</ax21:unitName2>
<ax21:unitName3 xsi:nil="true" />
<ax21:unitName4 xsi:nil="true" />
</ax21:scientificNames>
</ns:return>
</ns:searchByScientificNameResponse>

Specifically, I want to get the value of the "ax21:tsn" element (in this case, the integer 26339).具体来说,我想获取“ax21:tsn”元素的值(在本例中为 integer 26339)。

I tried the answers from here and here , without success.我从这里这里尝试了答案,但没有成功。 Here is my code:这是我的代码:

import lxml.etree as ET

tree = ET.parse("sample.xml")
#print(ET.tostring(tree))

namespaces = {'ax21': 'http://data.itis_service.itis.usgs.gov/xsd'} 
tsn = tree.find('scientificNames/tsn', namespaces)
print(tsn)

It just returns nothing.它什么也不返回。 It there a really intelligent way of doing this using xpath?使用 xpath 有一种非常智能的方法吗?

Two problems:两个问题:

  1. scientificNames is not a direct child of the root element; scientificNames不是根元素的直接子元素; it is a grandchild.这是一个孙子。

  2. You need to use the ax21 prefix in the XPath expression.您需要在 XPath 表达式中使用ax21前缀。

The following works:以下作品:

tsn = tree.find('.//ax21:scientificNames/ax21:tsn', namespaces)

Or simply:或者简单地说:

tsn = tree.find('.//ax21:tsn', namespaces)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM