简体   繁体   中英

How to replace UTF-8 to similar Latin letters in a String?

I am having a String

s = M\\c3\\a4nager

I want to replace \\\\c3\\\\a4 with its equivalent Latin character ä So the String should be

s = Mänager

I searched a lot how to do it in java please help me with the same I want to handle all such UTF-8 characters in my code.

To unescape the LDAP string you could use following snippet

// import javax.naming.ldap.Rdn;
String escapedValue = "M\\c3\\a4nager";
Object unescapedValue = Rdn.unescapeValue(escapedValue);
System.out.println("escapedValue   = " + escapedValue);
System.out.println("unescapedValue = " + unescapedValue);

output

escapedValue   = M\c3\a4nager
unescapedValue = Mänager

unescapedValue contains the String as UTF-8. If you need another encoding you need to handle it properly.

Simple example to show the difference in bytes for different encodings.

byte[] latinBytes = ((String)unescapedValue).getBytes(StandardCharsets.ISO_8859_1);
byte[] utf8Bytes = ((String)unescapedValue).getBytes(StandardCharsets.UTF_8);

System.out.println("latin1: " + Arrays.toString(latinBytes));
System.out.println("utf8  : " + Arrays.toString(utf8Bytes));

output

latin1: [77, -28, 110, 97, 103, 101, 114]
utf8  : [77, -61, -92, 110, 97, 103, 101, 114]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM