[英]parse the child tags with a specific matching string in xml using python
我想解析具有標簽Topics作為父標簽和Topic1,Topic2作為子標簽的xml字符串。
<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>
我只想解析此xml,以便我可以獲取每個Topic標記的屬性值,我只希望它位於for循環中。
我已經嘗試使用以下代碼:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='sample.xml')
#get the root element
root = tree.getroot()
namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}
for devs in root.findall('xmlns:Topics' ,namespace):
for child_tags in devs.findall('xmlns:./', namespace):
print 'child: ', child_tags.tag
我只想在倒數第二行中添加一些類似主題/ d的通配符,以便我可以解析與主題匹配的每個標簽
例如,您可以檢查tag
屬性是否以名稱空間加上前綴Topic
為開頭。
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
print (topic.text)
或更短為
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
print (topic.text)
或將支票放入for
語句內的if
語句中。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.