简体   繁体   中英

Replacing unicode characters in string in C#

I have string for example:

string str = "ĄĆŹ - ćwrą";

How can i replace ĄĆŹ - ćą with they shortcuts? Result for that example string should be:

str = "\u0104\u0106\u0179 \u2013 \u0107wr\u0105"

Is there any fast replacement method? I dont want to use .Replace for each character...

Converting to a JSON string like that is more cumbersome than it should be, mainly because you need to work with Unicode code points which in practice means calling char.ConvertToUtf32 . In order to do that, you need to somehow handle surrogate pairs; System.Globalization.StringInfo can help with that.

Here's a function that uses these building blocks to perform the conversion:

string str = "ĄĆŹ - ćwrą";

public string ToJsonString(string s)
{
    var enumerator = StringInfo.GetTextElementEnumerator(s);
    var sb = new StringBuilder();

    while (enumerator.MoveNext())
    {
        var unicodeChar = enumerator.GetTextElement();
        var codePoint = char.ConvertToUtf32(unicodeChar, 0);
        if (codePoint < 0x80) {
            sb.Append(unicodeChar);
        }
        else if (codePoint < 0xffff) {
            sb.Append("\\u").Append(codePoint.ToString("x4"));
        }
        else {
            sb.Append("\\u").Append((codePoint & 0xffff).ToString("x4"));
            sb.Append("\\u").Append(((codePoint >> 16) & 0xffff).ToString("x4"));
        }
    }

    return sb.ToString();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM