简体   繁体   English

如何最好使用lxml获取xml中标签之间的文本

[英]How to get the text between tags in a xml, using preferably lxml

This is an example of the tag, but I can't get the text between tags, not iterating over the tags, not with node.text in the node <seg> . 这是标签的示例,但是我无法获取标签之间的文本,无法遍历标签,也无法获取节点<seg>中的node.text。 That's why I'm asking, all the help will be welcome (sorry for my english). 这就是为什么我要问的,所有帮助都将受到欢迎(对不起我的英语)。

    <tuv>
         <seg>If you want to save items in a 
            <bpt i="1">&lt;Message id=&quot;Message:1T0000772343:f000012900ce8eb3:MPhS&quot;&gt;</bpt>
            <ept i="1">&lt;/Message&gt;</ept> 
            for which no connection has been established or in a 
            <bpt i="2">&lt;Message id=&quot;Message:1T0000772343:f000012900ceac3d:pvy4&quot;&gt;</bpt>
            <ept i="2">&lt;/Message&gt;</ept> 
            that requires authentication, you need to connect to the library.
         </seg>
   </tuv>

Wanted Output: 想要的输出:

If you want to save items in a for which no connection has been established or in a that requires authentication, you need to connect to the library. 如果要将项目保存在尚未建立连接的项目中或需要身份验证的项目中,则需要连接到库。

Use .xpath("text()") on the <seg> element to get all text nodes. <seg>元素上使用.xpath("text()")获取所有文本节点。

This code prints the wanted output: 此代码显示所需的输出:

from lxml import etree

root = etree.parse("tuv.xml")  
seg = root.find("seg")

# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))

# Print result with unwanted whitespace removed
print " ".join(text.split())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM