简体   繁体   English

如何使用ElementTree从标签中提取文本

[英]How to extract text from tags using ElementTree

I have following XML file: 我有以下XML文件:

<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>

I need to extract text from a specific tag. 我需要从特定标签中提取文本。 On http://effbot.org are very few examples and generally poor documentation. http://effbot.org上的示例很少,文档通常也很差。 Maybe there are good examples somewhere else? 也许其他地方有很好的例子? And how process the text in the same tags (token) as separate entities? 以及如何将相同标签(令牌)中的文本作为单独的实体进行处理? Thanks in advance! 提前致谢! The result should be approximately as follows: 结果应大致如下:

(like) feel > not #This is not text

I am not clear on what you wish to do with the contents of the <mess> element. 我不清楚您希望如何使用<mess>元素的内容。
For the children of the <verb> element, try this: 对于<verb>元素的子代,请尝试以下操作:

import xml.etree.ElementTree as ET
the_tree = ET.fromstring('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = the_tree.find('./verb').getchildren()
verbs = [verb.text for verb in elems]
# -> ['like', 'feel']

If your file is larger, perhaps you would prefer this alternative way of accessing elements: 如果文件较大,则可能更希望使用这种替代的方式访问元素:

tree, id_map = ET.XMLID('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = id_map['1'].find('verb')
verbs = [verb.text for verb in elems]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM