Python-XML：检查某个标记是否不存在，然后继续吗？

Question

我有一个XML文件，正在使用元素树。

例如，我有这个XML python：

<TEXT>

<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>not</PREP>
<en x='LOC'>Mexico</en>
</PHRASE>

<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>in</PREP>
<en x='LOC'>USA</en>
</PHRASE>

<PHRASE>
<en x='ORG'>Alpha</en>
<CONJ>is</CONJ>
<NEG>not</NEG>
<PREP>in</PREP>
<en x='LOC'>Atlanta</en> 
</PHRASE> 

<PHRASE>
<en x='ORG'>Google</en>
<CONJ>is</CONJ>
<PREP>in</PREP>
<en x='LOC'>California</en> 
</PHRASE> 


</TEXT>

我想提取没有<NEG>标签的对：我希望输出为：

美国纽约

Google加利福尼亚

我尝试了这个：

neg= elt.findall('NEG')
                if neg is None:
                 continue

但这没用

import xml.etree.ElementTree as ET
tree = ET.parse('TrainBaseEnglish.xml')
root = tree.getroot()
print("------------------------ORG-LOC-------------------------------")
ORG_LOCcount=0
for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en.text for en in phrase.findall('en')}
    if 'ORG' in ens and 'LOC' in ens:
        print("ORG is: {}, LOC is: {} /".format(ens["ORG"], ens["LOC"]))
        #print(ens["ORG"])
        #print(ens["PERS"])
        ORG_LOCcount = ORG_LOCcount + 1
print("Number of ORG_LOC relation", ORG_LOCcount)
print("------------------------LOC-LOC-------------------------------")
LOC_LOCcount=0
for phrase in root:
    if phrase.tag == 'PHRASE':
        collected_names = []
        for elt in phrase:
             if elt.tag == 'en':
                if 'x' in elt.attrib and elt.attrib['x'] == 'LOC':
                    collected_names += [elt.text]
        if len(collected_names) >= 2:
            print("LOC is: {}, LOC is: {} /".format(collected_names[0],collected_names[1]))
            LOC_LOCcount = LOC_LOCcount + 1
print("Number of LOC_LOC relation", LOC_LOCcount)

Answer 1

如果每个PHRASE仅包含一个NEG元素，则可以执行此操作

import xml.etree.ElementTree as ET

xml_string = '''<TEXT>

<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>not</PREP>
<en x='LOC'>Mexico</en>
</PHRASE>

<PHRASE>
<en x='LOC'>NY</en>
<PREP>is</PREP>
<PREP>in</PREP>
<en x='LOC'>USA</en>
</PHRASE>

<PHRASE>
<en x='ORG'>Alpha</en>
<CONJ>is</CONJ>
<NEG>not</NEG>
<PREP>in</PREP>
<en x='LOC'>Atlanta</en> 
</PHRASE> 

<PHRASE>
<en x='ORG'>Google</en>
<CONJ>is</CONJ>
<PREP>in</PREP>
<en x='LOC'>California</en> 
</PHRASE> 


</TEXT>
'''

xml = ET.fromstring(xml_string)
phrases = xml.findall("PHRASE")

for phrase in phrases:
    neg = phrase.find("NEG")
    if neg is None:
        continue
    print neg

如果您希望每个PHRASE包含多个NEG元素，并且因此需要使用.findall ，则应检查neg的长度，因为.findall返回列表。

for phrase in phrases:
    neg = phrase.findall("NEG")
    if len(neg) == 0:
        continue
    print neg

Python-XML：检查某个标记是否不存在，然后继续吗？

问题描述

1 个解决方案

解决方案1
0 2016-04-01 12:15:02

Python-XML：检查某个标记是否不存在，然后继续吗？

问题描述

1 个解决方案

解决方案1 0 2016-04-01 12:15:02

解决方案1
0 2016-04-01 12:15:02