如何使用python 2.7遍歷xml項的多個子節點

Question

我正在嘗試以以下形式解析來自USPTO的結構不完美的XML數據：

<parent>
 <child>
  <child-text>text
  <child-text>more text</child-text>
  <child-text>more text</child-text>
  </child-text>
 </child>
</parent>

我正在嘗試捕獲子文本節點的所有文本。 但是如您所見，第一個子文本標簽直到所有其余標簽都完成后才關閉。 以下摘錄是一個示例：

<claims id="claims">
  <claim id="CLM-00001" num="00001">
    <claim-text>1. An all-solid-state electrochromic device comprising:
    <claim-text>a transparent base material; and</claim-text>
    <claim-text>an electrochromic multilayer-stack structure formed on the transparent base material, the electrochromic multilayer-stack structure comprising:
    <claim-text>a first transparent-conductive film;</claim-text>
    <claim-text>an ion-storage layer formed on the first transparent-conductive film;</claim-text>
    <claim-text>a solid-electrolyte layer formed on the ion-storage layer; and</claim-text>
    <claim-text>an electrochromic layer formed on the solid-electrolyte layer, the electrochromic layer comprising a reflection-controllable electrochromic layer comprising an antimony-based alloy comprising Sb<sub>x</sub>CoLi<sub>y </sub>in which 0.5&#x2266;x&#x2266;10, and 0.1&#x2266;y&#x2266;10.</claim-text>
    </claim-text>
    </claim-text>
  </claim>
<claim id="CLM-00002" num="00002">
<claim-text>2. The all-solid-state electrochromic device according to <claim-ref idref="CLM-00001">claim 1</claim-ref>, wherein 3&#x2266;x&#x2266;5 and 0.1&#x2266;y&#x2266;3.</claim-text>
</claim>
</claims>

我當前的方法僅捕獲第一個標簽的內容，而不能充分捕獲子元素的內容（例如在上面的示例中）：

claims = self.xml.claim
for i, claim in enumerate(claims):
        data = {}
        data['text'] = claim.contents_of('claim_text', as_string=True, upper=False)

盡管結構不一致，如何遍歷所有<claim-text>標簽和<claim-ref>子標簽？

Answer 1

我對xml文檔有類似的問題。 我所做的是

<xml_document>[<xml_document>.find("<claim-text>")+len(<claim-text>):<xml_document>.find("</claim-text>")]

這將返回xml標記內的內容

然后使用if語句刪除xml標簽內容中的所有多余標簽

if content contains [<\d>] then remove them by finding their indexes

對於每次迭代，都通過索引刪除xml_document的解析部分。

如何使用python 2.7遍歷xml項的多個子節點

問題描述

1 個解決方案

解決方案1
0 2015-06-05 18:48:40

這將返回xml標記內的內容

如何使用python 2.7遍歷xml項的多個子節點

問題描述

1 個解決方案

解決方案1 0 2015-06-05 18:48:40

這將返回xml標記內的內容

解決方案1
0 2015-06-05 18:48:40