[英]How to get BeautifulSoup 4 to respect a self-closing tag?
这个问题特定于BeautifulSoup4 ,这使得它与以前的问题不同:
BeautifulSoup中的selfClosingTags
由于BeautifulStoneSoup
已经消失(以前的xml解析器),我怎样才能让bs4
尊重新的自闭标签? 例如:
import bs4
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])
print soup.prettify()
不会自动关闭bar
标签,但会给出提示。 bs4所指的这个树构建器是什么以及如何自我关闭标记?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
"BS4 does not respect the selfClosingTags argument to the "
<html>
<body>
<foo>
<bar a="3">
</bar>
</foo>
</body>
</html>
要解析XML,请将“xml”作为BeautifulSoup构造函数的第二个参数传递。
soup = bs4.BeautifulSoup(S, 'xml')
您不再需要传递selfClosingTags
:
In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar a="3"/>
</foo>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.