HTML speacial character parsing

Question

I'm looking for a java class to parse all HTML special characters. I guess it's a common problem but i cannot find a fast solution right now.

What i wanto to get is:

input: th&egrave; --> output: thè
input: &#187;
input: &lraquo;
...

Do you know anything useful for me?

Answer 1

Have you googled on it? The first link on "java HTML markup entity parser" refers to html text extractor

It seems to be what you need.

Also, you may want to examine javax.swing.JLabel's (and another swing text components') renderers.

Answer 2

Try the StringEscapeUtils utility class. Check the docs for the StringEscapeUtils.unescapeHtml() method.

Docs here:

Download here: