Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and &egrave?

Question

I am trying to do some experiment with the org.apache.commons.lang.StringEscapeUtils class but I am finding some difficulties.

I have the following situation in my code:

String notNormalized = "c&apos;&egrave;";

System.out.println("NOT NORMALIZED: " + notNormalized);
System.out.println("NORMALIZED: " + StringEscapeUtils.escapeJava(notNormalized));

So first I have declared the notNormalized field that (at least in my head) have to represent a not normalized string that contains an apostrophe character represented by the ' and an accented vowel represented by the è (that should be the è character)

Then I try to print it without normalization and I espect that is print the c'è string and the its normalized version and I expect to retrieve the c'è normalized\\converted string.

But the problem is that I still obtain the same output, infact this is what I obtain in the console as output:

NOT NORMALIZED: c&apos;&egrave;
NORMALIZED: c&apos;&egrave;

Why? What am I missing? What is wrong? How can I perform this test and correctly convert a string that contains character as &apos ?

Answer 1

What you're looking to do is unescapeHtml4 .

So

System.out.println("NORMALIZED: " + StringEscapeUtils.unescapeHtml4(notNormalized));

which prints

NORMALIZED: c&apos;è

Unfortunately, &apos is not an HTML 4 entity and therefore can't be unescaped with this tool. You can use unescapeXml for the &apos but not for the &egrave . You'll have to mix and match.

Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and &egrave?

Question

1 answers

solution1
0 2015-03-17 17:23:55

Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and &egrave?

Question

1 answers

solution1 0 2015-03-17 17:23:55

solution1
0 2015-03-17 17:23:55