简体   繁体   中英

How can I create a basic human readable plain text representation of XHTML using Java?

Given some simple XHTML, I'd like to create a human readable plain text version of it. This would involve removing all HTML tags, but adding or preserving some whitespace.

For example, this input:

<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
  <li>Point one</li>
  <li>Point two</li>
</ul>
</div>

would become:

"This is some text, some is bold. Point one Point two"

(commas between the LIs would be ideal... :)

Jericho HTML Parser. You can either strip all the tags or call on a "renderer" class that tries to mimick the look (eg your bulleted lists would be tabbed)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM