使用 Python lxml 解析帶有命名空間的 XML 文檔時出現問題

Question

使用 Python lxml 庫，我正在嘗試解析 XML 文檔，如下所示：

<ns:searchByScientificNameResponse xmlns:ns="http://itis_service.itis.usgs.gov">
<ns:return xmlns:ax21="http://data.itis_service.itis.usgs.gov/xsd" xmlns:ax23="http://metadata.itis_service.itis.usgs.gov/xsd" xmlns:ax26="http://itis_service.itis.usgs.gov/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ax21:SvcScientificNameList">
<ax21:scientificNames xsi:type="ax21:SvcScientificName">
<ax21:tsn>26339</ax21:tsn>
<ax21:author>L.</ax21:author>
<ax21:combinedName>Vicia faba</ax21:combinedName>
<ax21:kingdom>Plantae</ax21:kingdom>
<ax21:unitInd1 xsi:nil="true" />
<ax21:unitInd2 xsi:nil="true" />
<ax21:unitInd3 xsi:nil="true" />
<ax21:unitInd4 xsi:nil="true" />
<ax21:unitName1>Vicia</ax21:unitName1>
<ax21:unitName2>faba</ax21:unitName2>
<ax21:unitName3 xsi:nil="true" />
<ax21:unitName4 xsi:nil="true" />
</ax21:scientificNames>
</ns:return>
</ns:searchByScientificNameResponse>

具體來說，我想獲取“ax21:tsn”元素的值（在本例中為 integer 26339）。

我從這里和這里嘗試了答案，但沒有成功。 這是我的代碼：

import lxml.etree as ET

tree = ET.parse("sample.xml")
#print(ET.tostring(tree))

namespaces = {'ax21': 'http://data.itis_service.itis.usgs.gov/xsd'} 
tsn = tree.find('scientificNames/tsn', namespaces)
print(tsn)

它什么也不返回。 使用 xpath 有一種非常智能的方法嗎？

Answer 1

兩個問題：

scientificNames不是根元素的直接子元素； 這是一個孫子。
您需要在 XPath 表達式中使用ax21前綴。

以下作品：

tsn = tree.find('.//ax21:scientificNames/ax21:tsn', namespaces)

或者簡單地說：

tsn = tree.find('.//ax21:tsn', namespaces)

使用 Python lxml 解析帶有命名空間的 XML 文檔時出現問題

問題描述

1 個解決方案

解決方案1
2 已采納 2021-01-29 19:44:33

使用 Python lxml 解析帶有命名空間的 XML 文檔時出現問題

問題描述

1 個解決方案

解決方案1 2 已采納 2021-01-29 19:44:33

解決方案1
2 已采納 2021-01-29 19:44:33