简体   繁体   中英

HTML speacial character parsing

I'm looking for a java class to parse all HTML special characters. I guess it's a common problem but i cannot find a fast solution right now.

What i wanto to get is:

input: thè --> output: thè
input: »
input: &lraquo;
...

Do you know anything useful for me?

Have you googled on it? The first link on "java HTML markup entity parser" refers to html text extractor

It seems to be what you need.

Also, you may want to examine javax.swing.JLabel's (and another swing text components') renderers.

Try the StringEscapeUtils utility class. Check the docs for the StringEscapeUtils.unescapeHtml() method.

Docs here:

http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html

Download here:

http://commons.apache.org/lang/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM