簡體   English   中英

使用python在xml中使用特定的匹配字符串解析子標簽

[英]parse the child tags with a specific matching string in xml using python

我想解析具有標簽Topics作為父標簽和Topic1,Topic2作為子標簽的xml字符串。

<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>

我只想解析此xml,以便我可以獲取每個Topic標記的屬性值,我只希望它位於for循環中。

我已經嘗試使用以下代碼:

    import xml.etree.cElementTree as ET
    tree = ET.ElementTree(file='sample.xml')

    #get the root element
    root = tree.getroot()
    namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}

    for devs in root.findall('xmlns:Topics' ,namespace):
        for child_tags in devs.findall('xmlns:./', namespace):
            print 'child: ', child_tags.tag

我只想在倒數第二行中添加一些類似主題/ d的通配符,以便我可以解析與主題匹配的每個標簽

例如,您可以檢查tag屬性是否以名稱空間加上前綴Topic為開頭。

from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
    print (topic.text)

或更短為

from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')

for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
    print (topic.text)

或將支票放入for語句內的if語句中。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM