简体   繁体   中英

Convert non-escaped unicode string to unicode

I have a text string from a mysql database which is

var str = "u0393u03a5u039du0391u0399u039au0391".

I want to replace the unicode characters to show them as they actually appear "ΓΥΝΑΙΚΑ". If I manually escape the u with \\u in .net, the conversion is done automatically.

I found the following function:

byte[] unicodeBytes = Encoding.Unicode.GetBytes(str);

// Perform the conversion from one encoding to the other.
byte[] ascibytes = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, unicodeBytes);

// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[Encoding.ASCII.GetCharCount(ascibytes, 0, ascibytes.Length)];

Encoding.ASCII.GetChars(ascibytes, 0, ascibytes.Length, asciiChars, 0);
return new string(asciiChars);

but since it has to be escaped I do

str =str.Replace("u", @"\u")

but with no luck. How can I convert this?

These are essentially UTF-16 code points, so this would do (this approach is not very efficient, but I assume optimization isn't the main goal):

Regex.Replace(
    "u0393u03a5u039du0391u0399u039au0391",
    "u[0-9a-f]{4}",
    m => "" + (char) int.Parse(m.Value.Substring(1), NumberStyles.AllowHexSpecifier)
)

This can't deal with the ambiguity of un-escaped "regular" characters in the string: dufface would effectively get turned into d + \ᆲ + e , which is probably not right. It will correctly handle surrogates, though ( ud83dudc96 is 💖).

Using the technique in this answer is another option:

Regex.Unescape(@"u0393u03a5u039du0391u0399u039au0391".Replace(@"\", @"\\").Replace("u", @"\u"))

The extra \\ escaping is there just in case the string should contain any backslashes already, which could be wrongly interpreted as escape sequences.

Yet another way:

var str = "u0393u03a5u039du0391u0399u039au0391";

if (str.Length > 0 && str[0] == 'u')
    str = str.Substring(1, str.Length - 1);

string chars = string.Concat(str.Split('u').Select(s => 
    Convert.ToChar(Convert.ToInt32("0x" + s,16))));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM