简体   繁体   中英

Convert a string with Unicode characters in decimal format(html encoded) to a regular string

I have one variable of type Map.

if (sourceMap.containsKey(currentRow)) {
    //Remove the row from Map
} else {
    //Mismatch
}

where sourceMap is a Hashmap variable which contains many strings such as

Period Name
Person Last Name
Person First Name
Order Code
Ship_to_Customer_Name
Sub_Profit_Center
Commission Amount
Credit Amount
Rate Amount
Apr-09
Morgan
Martin
1022334852
Carl Zeiss de M&# 195;&# 189;xico, S.A. de C.V.(no space after the # in the string)

and currentRow contains the following string:

Carl Zeiss de Mýxico, S.A. de C.V.

which is same as the last row.My requirement is It should match. Now it is not matching.What conversion I have to do to match those

These strings are from different files first one is downloaded as CSV so no unicode characters..and second one(currentRow) is downloaded in unicoded txt format and converted to CSV using dos2unix..

Carl Zeiss de Mýxico, S.A. de C.V.

This is a string with HTML-encoded characters in it. You can do an HTML-unescape using a utility function such as unescapeHtml4 .

Generally you want to keep your strings in raw form rather than with HTML-escapes in them. Look at wherever it is you got your sourceMap - if you control that and can fix it avoid the gratuitous escaping then this would have just worked. Note also that Mýxico , whether HTML-encoded or not, looks like evidence of mishandling Unicode characters somewhere else in the stack.

First we have to download the jar file from the following url http://www.java2s.com/Code/Jar/c/Downloadcommonlang3jar.htm

Now add the import statement as follows

import static org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4;

Now we need to use the method where we want to escape the html encoded string. for eg:

String s=Carl Zeiss de Mýxico, S.A. de C.V.
System.out.println("Before: "+s);
s=unescapeHtml4(s);
System.out.println("After: "+s);

Now the output will be as follows

Before: Carl Zeiss de Mýxico, S.A. de C.V. 
After: Carl Zeiss de Mýxico, S.A. de C.V.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM