简体   繁体   English

打印python中两个XML标签之间是什么?

[英]printing what is between two XML tags in python?

I am using element tree, for example Ive this XML code 我正在使用元素树,例如,此XML代码

<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>

What I want is to print the whole Phrase when I have ORG="Alpha" in en tag and PERS="John" in the other en tag, I want the output to be "Alpha Amazingly created by John" 我想要的是在en标签中有ORG =“ Alpha”而在另一个en标签中有PERS =“ John”时打印整个短语,我希望输出是“ John惊人地创建的Alpha”

I know how to search for Alpha and John, but my problem is printing what's in between 我知道如何搜索Alpha和John,但是我的问题是打印介于两者之间的内容

for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en.text for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"] =="Alpha" and ens["PERS"]=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"], ens["PERS"]))

but how do I print the rest of tag's text in that phrase. 但是如何在该短语中打印标签的其余文本。

import xml.etree.ElementTree as ET

xml = '''
<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>
'''

def section(seq, start, end):
  returning = False
  for item in seq:
    returning |= item == start
    if returning:
      yield item
    returning &= item != end

root = ET.fromstring(xml)
for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"].text =="Alpha" and ens["PERS"].text=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"].text, ens["PERS"].text))
          print(' '.join(el.text for el in section(phrase, ens["ORG"], ens["PERS"])))

Pretty easy: 相当容易:

import xml.etree.ElementTree as ET

data = """<TEXT>
    <PHRASE>
        <CONJ>and</CONJ>
        <V>came</V>
        <en x='PERS'>Adam</en>
        <PREP>from</PREP>
        <en x='LOC'>Atlanta</en>
    </PHRASE>
    <PHRASE>
        <en x='ORG'>Alpha</en>
        <ADJ y='1'>Amazingly</ADJ>
        <N>created by</N>
        <en x='PERS'>John</en>
    </PHRASE>
</TEXT>"""

root = ET.fromstring(data)

for node in root.findall('./PHRASE'):
    ens = [node.find('en[@x="ORG"]'), node.find('en[@x="PERS"]')]

    if all([i is not None for i in ens]):
        if 'Alpha' in ens[0].text and 'John' in ens[1].text:               
            print (" ".join(node.itertext()))
            # If you want remove eol (end of line chars) for each item:
            # " ".join([t.strip() for t in node.itertext()])
            break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM