Python XML拉式解析器

Question

I am trying to parse an XML file using Python. 我正在尝试使用Python解析XML文件。 Due to the size of the XML, I want to use a Pull Parser. 由于XML的大小，我想使用Pull Parser。 I found this one. 我找到了这个。

My code starts with 我的代码以

doc = pulldom.parse("myfile.xml")
for event, node in doc:
    # code here...

I am using 我在用

if (node.localName == "b"):

to get the XML tag name, and it works fine. 获取XML标签名称，并且效果很好。

What I can't find how to do is get the text from between the tags. 我找不到怎么做的是从标签之间获取文本。 Using node.nodeValue returns None . 使用node.nodeValue返回None 。

I can use node.toxml() to get the full XML for the node, but I only want the text between the tags. 我可以使用node.toxml()来获取该节点的完整XML，但是我只想要标记之间的文本。 Is there a way to do this other than using a regex replace to take the tags out of node.toxml() ? 除了使用正则表达式替换将标签从node.toxml()取出之外， node.toxml()吗？

Answer 1

You have two nodes with local name "b" for every tag with text - a START_ELEMENT and an END_ELEMENT . 对于每个带有文本的标签，您有两个本地名称为“ b”的节点START_ELEMENT和END_ELEMENT 。 Normally you should receive something like this: 通常，您应该收到以下内容：

START_ELEMENT
CHARACTERS
END_ELEMENT

So you are looking for the characters after a matching start-element. 因此，您要在匹配起始元素之后寻找字符。 You may want to try something like this: 您可能要尝试这样的事情：

from xml.dom.pulldom import CHARACTERS, START_ELEMENT, parse

doc = parse("myfile.xml")
text_expected = False
for event, node in doc:
    print event, node
    if text_expected:
        text_expected = False
        if event != CHARACTERS:
            # strange .. there should be some
            continue
        print node.data
    else:
        text_expected = (event == START_ELEMENT) and (node.localName == "b")

With this myfile.xml 有了这个myfile.xml

<a>
    <b>c1</b>
    <b>c2</b>
</a>

I get the output 我得到了输出

c1
c2

Note that you might need to strip() each string and you must ignore every other CHARACTERS -event. 请注意，您可能需要strip()每个字符串，并且必须忽略所有其他CHARACTERS -event。 Every linebreak and whitespace between two elements generate a CHARACTERS -event. 两个元素之间的每个换行和空格都会生成CHARACTERS事件。

Python XML拉式解析器

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-11-22 15:23:59

Python XML拉式解析器

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-11-22 15:23:59

解决方案1
1 已采纳 2012-11-22 15:23:59