简体   繁体   中英

Removing hexadecimal UTF-8 characters in java

I know this question has been asked before, but none of the solutions seemed to work for this particular problem. My Java application receives a username from another server. The username sometimes contains the hexadecimal representation of UTF-8 characters.

For example: "Féçon" comes in as F\\C3\\A9\\C3\\A7on.

None of the examples I found on this site (most of them use "getBytes") worked. No idea why.

So my question is: if you have defined a String with these characters, how can you remove them so it looks right again? You can try it yourself by using the following:

String test = "F\\C3\\A9\\C3\\A7on"

thanks! Mike

It's not the most performant solution, but at least the code is short.... You're basically URL decoding, where \\ indicates an encoded character instead of %. So the following code works:

String s = "F\\C3\\A9\\C3\\A7on";
s = s.replace('\\', '%');
System.out.println(URLDecoder.decode(s, "UTF-8"));

In this case getBytes won't work because it sounds like your Java string doesn't contain any Unicode characters; it just contains fifteen regular ASCII characters that represent the escape sequence of the unicode characters. It's likely that whatever your upstream component is, it's responsible for the escaping.

So easiest way to address this is to see if the "other end" can be persuaded to speak Unicode. If so, you'll get the characters directly in Java and Bob's your uncle.

Otherwise, you'll need to find some way of decoding these Strings. The simplest way I can think of is to iterate through, manually converting to char s and concatenating, something like this:

StringBuilder result = new StringBuilder();
char[] input = inputStr.toCharArray();
for (int i = 0; i < input.length; i++)
{
   switch (input[i])
   {
      case '\\':
         // Get the next two characters and turn it into a literal char
         String escapeCodeStr = input[i+1] + input[i+2];
         char escapedChar = (char)Integer.parseInt(escapeCodeStr, 16);
         result.append(escapedChar);
         i += 2; // Move pointer to account for two extra characters read
         break;

      default:
         result.append(input[i]);
   }
}

return result.toString();

This hasn't been tested, but it illustrates the principle of turning the escape codes into literal characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM