使用python在xml中使用特定的匹配字符串解析子標簽

Question

我想解析具有標簽Topics作為父標簽和Topic1，Topic2作為子標簽的xml字符串。

<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>

我只想解析此xml，以便我可以獲取每個Topic標記的屬性值，我只希望它位於for循環中。

我已經嘗試使用以下代碼：

    import xml.etree.cElementTree as ET
    tree = ET.ElementTree(file='sample.xml')

    #get the root element
    root = tree.getroot()
    namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}

    for devs in root.findall('xmlns:Topics' ,namespace):
        for child_tags in devs.findall('xmlns:./', namespace):
            print 'child: ', child_tags.tag

我只想在倒數第二行中添加一些類似主題/ d的通配符，以便我可以解析與主題匹配的每個標簽

Answer 1

例如，您可以檢查tag屬性是否以名稱空間加上前綴Topic為開頭。

from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
    print (topic.text)

或更短為

from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')

for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
    print (topic.text)

或將支票放入for語句內的if語句中。

使用python在xml中使用特定的匹配字符串解析子標簽

問題描述

1 個解決方案

解決方案1
1 已采納 2017-03-07 10:01:33

使用python在xml中使用特定的匹配字符串解析子標簽

問題描述

1 個解決方案

解決方案1 1 已采納 2017-03-07 10:01:33

解決方案1
1 已采納 2017-03-07 10:01:33