空元素错误与美丽的汤

Question

I am parsing an xml file using Beautiful Soup but have found inconsistent behaviour when parsing empty elements. 我正在使用Beautiful Soup解析xml文件，但是在解析空元素时发现不一致的行为。 Ie 即

from BeautifulSoup import BeautifulSoup
s1 = "<c><a /><b /></c>"
s2 = "<c><a></a><b></b></c>"
soup1 = BeautifulSoup(s1)
soup2 = BeautifulSoup(s2)
print soup1
# <c><a><b></b></a></c>
print soup2
# <c><a></a><b></b></c>

Note that the b tag is inside the a tag in the first case, but not in the second. 需要注意的是b标签内a在第一种情况下的标签，但不是在第二位。 I thought that the XML spec meant that s1 and s2 were equivalent? 我以为XML规范意味着s1和s2是等效的？

Any thoughts as to how I can deal with this? 关于我该如何处理呢？

Answer 1

The anchor and bold ( <a> ,  ) elements can not be self-closed, so this is invalid XHTML. 锚点和粗体（ <a> ，  ）元素不能自动关闭，因此这是无效的XHTML。

On top of that, the XHTML spec says a space must lead the slash: 最重要的是， XHTML规范说，必须在斜杠前加一个空格：

Include a space before the trailing / and > of empty elements, eg , <hr /> and <img src="karen.jpg" alt="Karen" />. 在空元素的末尾/和>之前加一个空格，例如 ，<hr />和<img src =“ karen.jpg” alt =“ Karen” />。 Also, use the minimized tag syntax for empty elements, eg , as the alternative syntax allowed by XML gives uncertain results in many existing user agents. 另外，对空元素使用最小化标签语法，例如 ，因为XML允许的替代语法 在许多现有用户代理中给出不确定的结果。

空元素错误与美丽的汤

问题描述

1 个解决方案

解决方案1
1 2012-03-08 13:32:27

空元素错误与美丽的汤

问题描述

1 个解决方案

解决方案1 1 2012-03-08 13:32:27

解决方案1
1 2012-03-08 13:32:27