简体   繁体   English

如何获取 xml.minidom 中元素的全文?

[英]How to get whole text of an Element in xml.minidom?

I want to get the whole text of an Element to parse some xhtml:我想获取一个元素的整个文本来解析一些 xhtml:

<div id='asd'>
  <pre>skdsk</pre>
</div>

begin E = div element on the above example, I want to get在上面的例子中开始 E = div 元素,我想得到

<pre>skdsk</pre>

How?如何?

Strictly speaking:严格来说:

from xml.dom.minidom import parse, parseString
tree = parseString("<div id='asd'><pre>skdsk</pre></div>")
root = tree.firstChild
node = root.childNodes[0]
print node.toxml()

In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library.不过,在实践中,我建议您查看http://www.crummy.com/software/BeautifulSoup/库。 Finding the right childNode in an xhtml document, and skipping "whitespace nodes" is a pain.在 xhtml 文档中找到正确的 childNode 并跳过“空白节点”是一种痛苦。 BeautifulSoup is a robust html/xhtml parser with fantastic tree-search capacilities. BeautifulSoup 是一个强大的 html/xhtml 解析器,具有出色的树搜索功能。

Edit: The example above compresses the HTML into one string.编辑:上面的示例将 HTML 压缩为一个字符串。 If you use the HTML as in the question, the line breaks and so-forth will generate "whitespace" nodes, so the node you want won't be at childNodes[0].如果您在问题中使用 HTML,则换行符等将生成“空白”节点,因此您想要的节点不会位于 childNodes[0]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM