when I prettify a soup, I am trying to get this:
<tag attr="val" />
Instead of this:
<tag attr="val"></tag>
I checked bs4.formatter
code and I didn't find an option related to my needs:
def __init__(
self, language=None, entity_substitution=None,
void_element_close_prefix='/', cdata_containing_tags=None,
empty_attributes_are_booleans=False, indent=1,
):
How can I achieve this? Thanks
I tried with new_tap options and bs4.formatter options.
I'm not sure why you'd want to do such a thing, since bs4 produces valid html and this would be messing with that, but you could use this function:
def closeVoidElements(html, voidEls=None, parser=None, pFormatter=None):
if type(voidEls) != list:
voidEls = [
'area', 'base', 'br', 'col', 'command', 'embed', 'wbr', 'img',
'input', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'hr'
] # void elements from https://www.w3.org/TR/2011/WD-html-markup-20110113/syntax.html#syntax-elements
html = BeautifulSoup(str(html), parser)
if voidEls: voidEls = set([t.name for t in html.find_all(voidEls)])
html = html.prettify()
for ve in voidEls:
html = html.replace(f'<{ve}', f'<{ve}_x').replace(f'{ve}>', f'{ve}_x>')
html = BeautifulSoup(html, parser).prettify(formatter=pFormatter)
for ve in voidEls:
html = html.replace(f'<{ve}_x', f'<{ve}').replace(f'{ve}_x>', f'{ve}>')
return html
and call it like closeVoidElements(soup)
instead of soup.prettify()
. (It's basically changing the tag names of self-closing tags so bs4 doesn't recognize them as such and then parsing and prettifying before changing them back.)
Before, there used to be a selfClosingTags
arguments for xml, but it has been discontinued.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.