[英]Convert non-escaped unicode string to unicode
I have a text string from a mysql database which is 我有一个来自mysql数据库的文本字符串
var str = "u0393u03a5u039du0391u0399u039au0391".
I want to replace the unicode characters to show them as they actually appear "ΓΥΝΑΙΚΑ". 我想替换unicode字符以将其显示为实际出现的“ΓΥΝΑΙΚΑ”。 If I manually escape the u with \\u in .net, the conversion is done automatically.
如果我在.net中用\\ u手动转义了u,则转换将自动完成。
I found the following function: 我发现以下功能:
byte[] unicodeBytes = Encoding.Unicode.GetBytes(str);
// Perform the conversion from one encoding to the other.
byte[] ascibytes = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[Encoding.ASCII.GetCharCount(ascibytes, 0, ascibytes.Length)];
Encoding.ASCII.GetChars(ascibytes, 0, ascibytes.Length, asciiChars, 0);
return new string(asciiChars);
but since it has to be escaped I do 但由于必须逃脱,我愿意
str =str.Replace("u", @"\u")
but with no luck. 但没有运气。 How can I convert this?
我该如何转换呢?
These are essentially UTF-16 code points, so this would do (this approach is not very efficient, but I assume optimization isn't the main goal): 这些本质上是UTF-16代码点,因此可以做到(这种方法效率不高,但我认为优化不是主要目标):
Regex.Replace(
"u0393u03a5u039du0391u0399u039au0391",
"u[0-9a-f]{4}",
m => "" + (char) int.Parse(m.Value.Substring(1), NumberStyles.AllowHexSpecifier)
)
This can't deal with the ambiguity of un-escaped "regular" characters in the string: dufface
would effectively get turned into d
+ \ᆲ
+ e
, which is probably not right. 这不能解决字符串中未转义的“常规”字符的歧义:
dufface
会有效地变成d
+ \ᆲ
+ e
,这可能不正确。 It will correctly handle surrogates, though ( ud83dudc96
is 💖). 但是,它将正确处理代理(
ud83dudc96
是💖)。
Using the technique in this answer is another option: 在此答案中使用该技术是另一种选择:
Regex.Unescape(@"u0393u03a5u039du0391u0399u039au0391".Replace(@"\", @"\\").Replace("u", @"\u"))
The extra \\
escaping is there just in case the string should contain any backslashes already, which could be wrongly interpreted as escape sequences. 如果字符串应该已经包含任何反斜杠,则可以使用多余的
\\
转义,这可能会错误地解释为转义序列。
Yet another way: 另一种方式:
var str = "u0393u03a5u039du0391u0399u039au0391";
if (str.Length > 0 && str[0] == 'u')
str = str.Substring(1, str.Length - 1);
string chars = string.Concat(str.Split('u').Select(s =>
Convert.ToChar(Convert.ToInt32("0x" + s,16))));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.