简体   繁体   English

如何在python中的xml标记之间保存文本?

[英]how to save the text between xml tags in python?

I am trying to get the text between the xml tags. 我正在尝试获取xml标记之间的文本。 There are several posts about it, but what I don't understand is how to save it in a variable. 关于它有几篇文章,但是我不明白的是如何将其保存在变量中。 The code below prints what I want, but as soon as I replace "print" with "return" it doesn't save this text in the variable. 下面的代码显示了我想要的内容,但是一旦我将“ print”替换为“ return”,它就不会将此文本保存在变量中。 I think I am missing something very simple here. 我想我在这里错过了一些非常简单的事情。

from xml.sax import make_parser, handler

line = '<text><p><s id="1">Some text <someothertag>some more text</someothertag></s></p></text>'
class extract_text(handler.ContentHandler):
    def characters(self, data):
        print data.strip()

parser = make_parser()
parser.setContentHandler(extract_text())
parser.feed(line)

So I would like to have a variable, which would be equal to "Some text some more text" Any idea is very welcome! 所以我想有一个变量,它等于“一些文本,更多文本”。任何想法都非常欢迎!

If you just return value from the handler it will not be stored anywhere. 如果仅从处理程序中返回值,它将不会存储在任何地方。 You need to do it yourself: 您需要自己做:

result = ''

class extract_text(handler.ContentHandler):
    def characters(self, data):
        global result
        result += data.strip() + '\n'

parser = make_parser()
parser.setContentHandler(extract_text())
parser.feed(line)

print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM