简体   繁体   中英

Prettify using beautifulsoup without adding line breaks

Say I have an HTML file like this

<html>
<body>
<p>Some post</p>
<p>Another post</p>
</body>
</html>

In python I can use soup.prettify() to adjust line indentation. However, prettify adds additional line breaks. The output looks like this

<html>
 <body>
  <p>
   Some post
  </p>
  <p>
   Another post
  </p>
 </body>
</html>

I would like to add indentation only, without adding additonal line breaks (equivalent to the effect "Reindent" has in Sublime Text). That is, I would like to output to look like this

<html>
<body>
    <p>Some post</p>
    <p>Another post</p>
</body>
</html>

Can this be done in python?

You can disable additional line breaks for certain tags using the preserve_whitespace_tags keyword argument:

soup = bs4.BeautifulSoup(my_html, preserve_whitespace_tags=["p"])

Documentation: bs4.builder.TreeBuilder.__init__

A list of tags to treat the way tags are treated in HTML. Tags in this list are immune from pretty-printing; their contents will always be output as-is.

However, there doesn't seem to be a "don't add any whitespace" option. The documentation even states:

Since it adds whitespace (in the form of newlines), prettify() changes the meaning of an HTML document and should not be used to reformat one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM