[英]Python-XML: check if a certain tag is not present then continue?
我有一个XML文件,正在使用元素树。
例如,我有这个XML python:
<TEXT>
<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>not</PREP>
<en x='LOC'>Mexico</en>
</PHRASE>
<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>in</PREP>
<en x='LOC'>USA</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<CONJ>is</CONJ>
<NEG>not</NEG>
<PREP>in</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Google</en>
<CONJ>is</CONJ>
<PREP>in</PREP>
<en x='LOC'>California</en>
</PHRASE>
</TEXT>
我想提取没有<NEG>
标签的对:我希望输出为:
美国纽约
Google加利福尼亚
我尝试了这个:
neg= elt.findall('NEG')
if neg is None:
continue
但这没用
import xml.etree.ElementTree as ET
tree = ET.parse('TrainBaseEnglish.xml')
root = tree.getroot()
print("------------------------ORG-LOC-------------------------------")
ORG_LOCcount=0
for phrase in root.findall('./PHRASE'):
ens = {en.get('x'): en.text for en in phrase.findall('en')}
if 'ORG' in ens and 'LOC' in ens:
print("ORG is: {}, LOC is: {} /".format(ens["ORG"], ens["LOC"]))
#print(ens["ORG"])
#print(ens["PERS"])
ORG_LOCcount = ORG_LOCcount + 1
print("Number of ORG_LOC relation", ORG_LOCcount)
print("------------------------LOC-LOC-------------------------------")
LOC_LOCcount=0
for phrase in root:
if phrase.tag == 'PHRASE':
collected_names = []
for elt in phrase:
if elt.tag == 'en':
if 'x' in elt.attrib and elt.attrib['x'] == 'LOC':
collected_names += [elt.text]
if len(collected_names) >= 2:
print("LOC is: {}, LOC is: {} /".format(collected_names[0],collected_names[1]))
LOC_LOCcount = LOC_LOCcount + 1
print("Number of LOC_LOC relation", LOC_LOCcount)
如果每个PHRASE
仅包含一个NEG
元素,则可以执行此操作
import xml.etree.ElementTree as ET
xml_string = '''<TEXT>
<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>not</PREP>
<en x='LOC'>Mexico</en>
</PHRASE>
<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>in</PREP>
<en x='LOC'>USA</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<CONJ>is</CONJ>
<NEG>not</NEG>
<PREP>in</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Google</en>
<CONJ>is</CONJ>
<PREP>in</PREP>
<en x='LOC'>California</en>
</PHRASE>
</TEXT>
'''
xml = ET.fromstring(xml_string)
phrases = xml.findall("PHRASE")
for phrase in phrases:
neg = phrase.find("NEG")
if neg is None:
continue
print neg
如果您希望每个PHRASE
包含多个NEG
元素,并且因此需要使用.findall
,则应检查neg
的长度,因为.findall
返回列表。
for phrase in phrases:
neg = phrase.findall("NEG")
if len(neg) == 0:
continue
print neg
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.