简体   繁体   中英

Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and &egrave?

I am trying to do some experiment with the org.apache.commons.lang.StringEscapeUtils class but I am finding some difficulties.

I have the following situation in my code:

String notNormalized = "c'è";

System.out.println("NOT NORMALIZED: " + notNormalized);
System.out.println("NORMALIZED: " + StringEscapeUtils.escapeJava(notNormalized));

So first I have declared the notNormalized field that (at least in my head) have to represent a not normalized string that contains an apostrophe character represented by the ' and an accented vowel represented by the è (that should be the è character)

Then I try to print it without normalization and I espect that is print the c'è string and the its normalized version and I expect to retrieve the c'è normalized\\converted string.

But the problem is that I still obtain the same output, infact this is what I obtain in the console as output:

NOT NORMALIZED: c'è
NORMALIZED: c'è

Why? What am I missing? What is wrong? How can I perform this test and correctly convert a string that contains character as &apos ?

What you're looking to do is unescapeHtml4 .

So

System.out.println("NORMALIZED: " + StringEscapeUtils.unescapeHtml4(notNormalized));

which prints

NORMALIZED: c'è

Unfortunately, &apos is not an HTML 4 entity and therefore can't be unescaped with this tool. You can use unescapeXml for the &apos but not for the &egrave . You'll have to mix and match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM