[英]printing what is between two XML tags in python?
I am using element tree, for example Ive this XML code 我正在使用元素树,例如,此XML代码
<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en>
</PHRASE>
</TEXT>
What I want is to print the whole Phrase when I have ORG="Alpha" in en tag and PERS="John" in the other en tag, I want the output to be "Alpha Amazingly created by John" 我想要的是在en标签中有ORG =“ Alpha”而在另一个en标签中有PERS =“ John”时打印整个短语,我希望输出是“ John惊人地创建的Alpha”
I know how to search for Alpha and John, but my problem is printing what's in between 我知道如何搜索Alpha和John,但是我的问题是打印介于两者之间的内容
for phrase in root.findall('./PHRASE'):
ens = {en.get('x'): en.text for en in phrase.findall('en')}
if 'ORG' in ens and 'PERS' in ens:
if (ens["ORG"] =="Alpha" and ens["PERS"]=="John"):
print("ORG is: {}, PERS is: {} /".format(ens["ORG"], ens["PERS"]))
but how do I print the rest of tag's text in that phrase. 但是如何在该短语中打印标签的其余文本。
import xml.etree.ElementTree as ET
xml = '''
<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en>
</PHRASE>
</TEXT>
'''
def section(seq, start, end):
returning = False
for item in seq:
returning |= item == start
if returning:
yield item
returning &= item != end
root = ET.fromstring(xml)
for phrase in root.findall('./PHRASE'):
ens = {en.get('x'): en for en in phrase.findall('en')}
if 'ORG' in ens and 'PERS' in ens:
if (ens["ORG"].text =="Alpha" and ens["PERS"].text=="John"):
print("ORG is: {}, PERS is: {} /".format(ens["ORG"].text, ens["PERS"].text))
print(' '.join(el.text for el in section(phrase, ens["ORG"], ens["PERS"])))
Pretty easy: 相当容易:
import xml.etree.ElementTree as ET
data = """<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en>
</PHRASE>
</TEXT>"""
root = ET.fromstring(data)
for node in root.findall('./PHRASE'):
ens = [node.find('en[@x="ORG"]'), node.find('en[@x="PERS"]')]
if all([i is not None for i in ens]):
if 'Alpha' in ens[0].text and 'John' in ens[1].text:
print (" ".join(node.itertext()))
# If you want remove eol (end of line chars) for each item:
# " ".join([t.strip() for t in node.itertext()])
break
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.