[英]Unicode to ASCII conversion/mapping
I need some sort of conversion/mapping that, for example, is done by CLCL clipboard manager. 我需要某种转换/映射,例如,由CLCL剪贴板管理器完成。
What it does is like that: 它的作用是这样的:
I copy the following Unicode text: ūī 我复制以下Unicode文本:ūī
And CLCL converts it to: ui CLCL将其转换为:ui
Is there any technique to do such a conversion? 有没有技术可以进行这样的转换? Or maybe there are mapping tables that can be used to convert, let's say, symbol ū is mapped to u. 或者也许有可用于转换的映射表,比方说,符号ū映射到u。
UPDATE UPDATE
Thanks to all for help. 感谢大家的帮助。 Here is what I came with (a hybrid of two solutions), one posted by Erik Schierboom and one taken from http://blogs.infosupport.com/normalizing-unicode-strings-in-c/#comment-8984 以下是我的参考资料(两种解决方案的混合体),一份由Erik Schierboom发布,一份来自http://blogs.infosupport.com/normalizing-unicode-strings-in-c/#comment-8984
public static string ConvertUnicodeToAscii(string unicodeStr, bool skipNonConvertibleChars = false)
{
if (string.IsNullOrWhiteSpace(unicodeStr))
{
return unicodeStr;
}
var normalizedStr = unicodeStr.Normalize(NormalizationForm.FormD);
if (skipNonConvertibleChars)
{
return new string(normalizedStr.ToCharArray().Where(c => (int) c <= 127).ToArray());
}
return new string(
normalizedStr.Where(
c =>
{
UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(c);
return category != UnicodeCategory.NonSpacingMark;
}).ToArray());
}
I have used the following code for some time: 我已经使用了以下代码一段时间了:
private static string NormalizeDiacriticalCharacters(string value)
{
if (value == null)
{
throw new ArgumentNullException("value");
}
var normalised = value.Normalize(NormalizationForm.FormD).ToCharArray();
return new string(normalised.Where(c => (int)c <= 127).ToArray());
}
In general, it is not possible to convert Unicode to ASCII because ASCII is a subset of Unicode. 通常,无法将Unicode转换为ASCII,因为ASCII是Unicode的子集。
That being said, it is possible to convert characters within the ASCII subset of Unicode to Unicode. 话虽这么说,可以将Unicode的ASCII子集中的字符转换为Unicode。
In C#, generally there's no need to do the conversion, since all strings are Unicode by default anyway, and all components are Unicode-aware, but if you must do the conversion, use the following: 在C#中,通常不需要进行转换,因为默认情况下所有字符串都是Unicode,并且所有组件都支持Unicode,但如果必须进行转换,请使用以下命令:
string myString = "SomeString";
byte[] asciiString = System.Text.Encoding.ASCII.GetBytes(myString);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.