简体   繁体   中英

How to shorten HTML Code using JSoup or HTMLCleaner

Good day everyone. I am trying to save HTML code in database and I am using SHEF(Swing HTML Editor Framework) , but I have a huge problem. Usually, the generated HTML is like this:

<div>
This is the first paragraph
</div>
<div>
This is the second paragraph.
</div>
<div>
This is the last paragraph.
</div>

I want to "clean" the html code and make the result look like this instead:

<div>
This is the first paragraph
<br>
This is the second paragraph.
<br>
This is the last paragraph.
</div>

I tried to use HTMLCleaner and JSoup , but I haven't made it. I can only make JSoup work such that

<div>
This is the first paragraph
</div>
<div>

</div>
<div>
This is the last paragraph.
</div>

becomes

<div>
This is the first paragraph
</div>
<br>
<div>
This is the last paragraph.
</div>

This is the JSoup code that I use:

Document source = Jsoup.parse(sourceString);

// For each element
for(Element el: source.select("*")) {

   if(el.children().isEmpty() && !el.hasText() && el.isBlock()) {
       el.replaceWith(new Element(Tag.valueOf("br"), ""));//replace empty tags with newline
   }
}
return source.body().html();

Is there any way to make the generated HTML Code shorter? Thanks!

I would suggest that, rather than fiddling about with the HTML and trying to minimize it, you just gzip compress it and save that to your DB instead (and inflate on the way out).

The CPU overhead is minimal, and the savings will be much higher. And your code will be simpler and more general. gzip for HTML typically gives a 75%-80% compression ratio, whilst removing a few tags is going to give you, what, 10%?

Here's an example of how to compress / decompress .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM