简体   繁体   中英

Unicode to ASCII conversion/mapping

I need some sort of conversion/mapping that, for example, is done by CLCL clipboard manager.

What it does is like that:

I copy the following Unicode text: ūī
And CLCL converts it to: ui

Is there any technique to do such a conversion? Or maybe there are mapping tables that can be used to convert, let's say, symbol ū is mapped to u.

UPDATE

Thanks to all for help. Here is what I came with (a hybrid of two solutions), one posted by Erik Schierboom and one taken from http://blogs.infosupport.com/normalizing-unicode-strings-in-c/#comment-8984

public static string ConvertUnicodeToAscii(string unicodeStr, bool skipNonConvertibleChars = false)
{
    if (string.IsNullOrWhiteSpace(unicodeStr))
    {
        return unicodeStr;
    }

    var normalizedStr = unicodeStr.Normalize(NormalizationForm.FormD);

    if (skipNonConvertibleChars)
    {
        return new string(normalizedStr.ToCharArray().Where(c => (int) c <= 127).ToArray());
    }

    return new string(
        normalizedStr.Where(
            c =>
                {
                    UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(c);
                    return category != UnicodeCategory.NonSpacingMark;
                }).ToArray());
}

I have used the following code for some time:

private static string NormalizeDiacriticalCharacters(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }

    var normalised = value.Normalize(NormalizationForm.FormD).ToCharArray();

    return new string(normalised.Where(c => (int)c <= 127).ToArray());
}

In general, it is not possible to convert Unicode to ASCII because ASCII is a subset of Unicode.

That being said, it is possible to convert characters within the ASCII subset of Unicode to Unicode.

In C#, generally there's no need to do the conversion, since all strings are Unicode by default anyway, and all components are Unicode-aware, but if you must do the conversion, use the following:

 string myString = "SomeString";
 byte[] asciiString = System.Text.Encoding.ASCII.GetBytes(myString);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM