This question is specific to BeautifulSoup4 , which makes it different from the previous questions:
Why is BeautifulSoup modifying my self-closing elements?
selfClosingTags in BeautifulSoup
Since BeautifulStoneSoup
is gone (the previous xml parser), how can I get bs4
to respect a new self-closing tag? For example:
import bs4
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])
print soup.prettify()
Does not self-close the bar
tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
"BS4 does not respect the selfClosingTags argument to the "
<html>
<body>
<foo>
<bar a="3">
</bar>
</foo>
</body>
</html>
To parse XML you pass in “xml” as the second argument to the BeautifulSoup constructor.
soup = bs4.BeautifulSoup(S, 'xml')
You'll need to have lxml installed.
You don't need to pass selfClosingTags
anymore:
In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar a="3"/>
</foo>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.