[英]Empty element error with Beautiful Soup
I am parsing an xml file using Beautiful Soup but have found inconsistent behaviour when parsing empty elements. 我正在使用Beautiful Soup解析xml文件,但是在解析空元素时发现不一致的行为。 Ie
即
from BeautifulSoup import BeautifulSoup
s1 = "<c><a /><b /></c>"
s2 = "<c><a></a><b></b></c>"
soup1 = BeautifulSoup(s1)
soup2 = BeautifulSoup(s2)
print soup1
# <c><a><b></b></a></c>
print soup2
# <c><a></a><b></b></c>
Note that the b
tag is inside the a
tag in the first case, but not in the second. 需要注意的是
b
标签内a
在第一种情况下的标签,但不是在第二位。 I thought that the XML spec meant that s1
and s2
were equivalent? 我以为XML规范意味着
s1
和s2
是等效的?
Any thoughts as to how I can deal with this? 关于我该如何处理呢?
The anchor and bold ( <a>
, <b>
) elements can not be self-closed, so this is invalid XHTML. 锚点和粗体(
<a>
, <b>
)元素不能自动关闭,因此这是无效的XHTML。
On top of that, the XHTML spec says a space must lead the slash: 最重要的是, XHTML规范说,必须在斜杠前加一个空格:
Include a space before the trailing / and > of empty elements, eg <br />, <hr /> and <img src="karen.jpg" alt="Karen" />.
在空元素的末尾/和>之前加一个空格,例如<br />,<hr />和<img src =“ karen.jpg” alt =“ Karen” />。 Also, use the minimized tag syntax for empty elements, eg <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.
另外,对空元素使用最小化标签语法,例如<br />,因为XML允许的替代语法<br> </br>在许多现有用户代理中给出不确定的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.