将 Unicode 字符串转换为转义的 ASCII 字符串

Question

How can I convert this string:如何转换此字符串：

This string contains the Unicode character Pi(π)

into an escaped ASCII string:转换为转义的 ASCII 字符串：

This string contains the Unicode character Pi(\u03a0)

and vice versa ?反之亦然？

The current Encoding available in C# converts the π character to "?". C# 中可用的当前编码将 π 字符转换为“?”。 I need to preserve that character.我需要保留那个性格。

Answer 1

This goes back and forth to and from the \\uXXXX format.这在 \\uXXXX 格式之间来回切换。

class Program {
    static void Main( string[] args ) {
        string unicodeString = "This function contains a unicode character pi (\u03a0)";

        Console.WriteLine( unicodeString );

        string encoded = EncodeNonAsciiCharacters(unicodeString);
        Console.WriteLine( encoded );

        string decoded = DecodeEncodedNonAsciiCharacters( encoded );
        Console.WriteLine( decoded );
    }

    static string EncodeNonAsciiCharacters( string value ) {
        StringBuilder sb = new StringBuilder();
        foreach( char c in value ) {
            if( c > 127 ) {
                // This character is too big for ASCII
                string encodedValue = "\\u" + ((int) c).ToString( "x4" );
                sb.Append( encodedValue );
            }
            else {
                sb.Append( c );
            }
        }
        return sb.ToString();
    }

    static string DecodeEncodedNonAsciiCharacters( string value ) {
        return Regex.Replace(
            value,
            @"\\u(?<Value>[a-zA-Z0-9]{4})",
            m => {
                return ((char) int.Parse( m.Groups["Value"].Value, NumberStyles.HexNumber )).ToString();
            } );
    }
}

Outputs:输出：

This function contains a unicode character pi (π)此函数包含一个 unicode 字符 pi (π)

This function contains a unicode character pi (\Π)此函数包含一个 unicode 字符 pi (\Π)

This function contains a unicode character pi (π)此函数包含一个 unicode 字符 pi (π)

Answer 2

For Unescape You can simply use this functions:对于Unescape，您可以简单地使用以下功能：

System.Text.RegularExpressions.Regex.Unescape(string)

System.Uri.UnescapeDataString(string)

I suggest using this method (It works better with UTF-8):我建议使用这种方法（使用 UTF-8 效果更好）：

UnescapeDataString(string)

Answer 3

string StringFold(string input, Func<char, string> proc)
{
  return string.Concat(input.Select(proc).ToArray());
}

string FoldProc(char input)
{
  if (input >= 128)
  {
    return string.Format(@"\u{0:x4}", (int)input);
  }
  return input.ToString();
}

string EscapeToAscii(string input)
{
  return StringFold(input, FoldProc);
}

Answer 4

As a one-liner:作为单线：

var result = Regex.Replace(input, @"[^\x00-\x7F]", c => 
    string.Format(@"\u{0:x4}", (int)c.Value[0]));

Answer 5

class Program
{
        static void Main(string[] args)
        {
            char[] originalString = "This string contains the unicode character Pi(π)".ToCharArray();
            StringBuilder asAscii = new StringBuilder(); // store final ascii string and Unicode points
            foreach (char c in originalString)
            {
                // test if char is ascii, otherwise convert to Unicode Code Point
                int cint = Convert.ToInt32(c);
                if (cint <= 127 && cint >= 0)
                    asAscii.Append(c);
                else
                    asAscii.Append(String.Format("\\u{0:x4} ", cint).Trim());
            }
            Console.WriteLine("Final string: {0}", asAscii);
            Console.ReadKey();
        }
}

All non-ASCII chars are converted to their Unicode Code Point representation and appended to the final string.所有非 ASCII 字符都转换为其 Unicode 代码点表示形式并附加到最终字符串。

Answer 6

Here is my current implementation:这是我目前的实现：

public static class UnicodeStringExtensions
{
    public static string EncodeNonAsciiCharacters(this string value) {
        var bytes = Encoding.Unicode.GetBytes(value);
        var sb = StringBuilderCache.Acquire(value.Length);
        bool encodedsomething = false;
        for (int i = 0; i < bytes.Length; i += 2) {
            var c = BitConverter.ToUInt16(bytes, i);
            if ((c >= 0x20 && c <= 0x7f) || c == 0x0A || c == 0x0D) {
                sb.Append((char) c);
            } else {
                sb.Append($"\\u{c:x4}");
                encodedsomething = true;
            }
        }
        if (!encodedsomething) {
            StringBuilderCache.Release(sb);
            return value;
        }
        return StringBuilderCache.GetStringAndRelease(sb);
    }


    public static string DecodeEncodedNonAsciiCharacters(this string value)
      => Regex.Replace(value,/*language=regexp*/@"(?:\\u[a-fA-F0-9]{4})+", Decode);

    static readonly string[] Splitsequence = new [] { "\\u" };
    private static string Decode(Match m) {
        var bytes = m.Value.Split(Splitsequence, StringSplitOptions.RemoveEmptyEntries)
                .Select(s => ushort.Parse(s, NumberStyles.HexNumber)).SelectMany(BitConverter.GetBytes).ToArray();
        return Encoding.Unicode.GetString(bytes);
    }
}

This passes a test:这通过了一个测试：

public void TestBigUnicode() {
    var s = "\U00020000";
    var encoded = s.EncodeNonAsciiCharacters();
    var decoded = encoded.DecodeEncodedNonAsciiCharacters();
    Assert.Equals(s, decoded);
}

with the encoded value: "\?\?"使用编码值： "\?\?"

This implementation makes use of a StringBuilderCache (reference source link)此实现使用StringBuilderCache （参考源链接）

Answer 7

A small patch to @Adam Sills's answer which solves FormatException on cases where the input string like "c:\«\\otherdirectory\\ " plus RegexOptions.Compiled makes the Regex compilation much faster: @Adam Sills 的答案的一个小补丁，它解决了FormatException在输入字符串如"c:\«\\otherdirectory\\ " 加上RegexOptions.Compiled使Regex编译速度更快的情况下：

    private static Regex DECODING_REGEX = new Regex(@"\\u(?<Value>[a-fA-F0-9]{4})", RegexOptions.Compiled);
    private const string PLACEHOLDER = @"#!#";
    public static string DecodeEncodedNonAsciiCharacters(this string value)
    {
        return DECODING_REGEX.Replace(
            value.Replace(@"\\", PLACEHOLDER),
            m => { 
                return ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString(); })
            .Replace(PLACEHOLDER, @"\\");
    }

Answer 8

To store actual Unicode codepoints, you have to first decode the String's UTF-16 codeunits to UTF-32 codeunits (which are currently the same as the Unicode codepoints).要存储实际的 Unicode 代码点，您必须首先将字符串的 UTF-16 代码单元解码为 UTF-32 代码单元（目前与 Unicode 代码点相同）。 Use System.Text.Encoding.UTF32.GetBytes() for that, and then write the resulting bytes to the StringBuilder as needed,ie为此使用System.Text.Encoding.UTF32.GetBytes() ，然后根据需要将结果字节写入StringBuilder ，即

static void Main(string[] args) 
{ 
    String originalString = "This string contains the unicode character Pi(π)"; 
    Byte[] bytes = Encoding.UTF32.GetBytes(originalString);
    StringBuilder asAscii = new StringBuilder();
    for (int idx = 0; idx < bytes.Length; idx += 4)
    { 
        uint codepoint = BitConverter.ToUInt32(bytes, idx);
        if (codepoint <= 127) 
            asAscii.Append(Convert.ToChar(codepoint)); 
        else 
            asAscii.AppendFormat("\\u{0:x4}", codepoint); 
    } 
    Console.WriteLine("Final string: {0}", asAscii); 
    Console.ReadKey(); 
}

Answer 9

You need to use the Convert() method in the Encoding class:您需要使用Encoding类中的Convert()方法：

Create an Encoding object that represents ASCII encoding创建一个表示 ASCII 编码的Encoding对象
Create an Encoding object that represents Unicode encoding创建一个表示 Unicode 编码的Encoding对象
Call Encoding.Convert() with the source encoding, the destination encoding, and the string to be encoded使用源编码、目标编码和要编码的字符串调用Encoding.Convert()

There is an example here :有一个例子在这里：

using System;
using System.Text;

namespace ConvertExample
{
   class ConvertExampleClass
   {
      static void Main()
      {
         string unicodeString = "This string contains the unicode character Pi(\u03a0)";

         // Create two different encodings.
         Encoding ascii = Encoding.ASCII;
         Encoding unicode = Encoding.Unicode;

         // Convert the string into a byte[].
         byte[] unicodeBytes = unicode.GetBytes(unicodeString);

         // Perform the conversion from one encoding to the other.
         byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);

         // Convert the new byte[] into a char[] and then into a string.
         // This is a slightly different approach to converting to illustrate
         // the use of GetCharCount/GetChars.
         char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
         ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
         string asciiString = new string(asciiChars);

         // Display the strings created before and after the conversion.
         Console.WriteLine("Original string: {0}", unicodeString);
         Console.WriteLine("Ascii converted string: {0}", asciiString);
      }
   }
}

将 Unicode 字符串转换为转义的 ASCII 字符串

问题描述

9 个解决方案

解决方案1
140 已采纳 2009-10-23 20:59:01

解决方案2
23 2015-07-11 21:53:36

解决方案3
11 2009-10-23 20:54:09

解决方案4
4 2014-08-17 14:03:48

解决方案5
2 2009-10-23 21:28:55

解决方案6
2 2016-09-06 15:52:51

解决方案7
1 2012-09-24 10:50:49

解决方案8
1 2009-10-23 22:08:21

解决方案9
0 2009-10-23 20:20:57

将 Unicode 字符串转换为转义的 ASCII 字符串

问题描述

9 个解决方案

解决方案1 140 已采纳 2009-10-23 20:59:01

解决方案2 23 2015-07-11 21:53:36

解决方案3 11 2009-10-23 20:54:09

解决方案4 4 2014-08-17 14:03:48

解决方案5 2 2009-10-23 21:28:55

解决方案6 2 2016-09-06 15:52:51

解决方案7 1 2012-09-24 10:50:49

解决方案8 1 2009-10-23 22:08:21

解决方案9 0 2009-10-23 20:20:57

解决方案1
140 已采纳 2009-10-23 20:59:01

解决方案2
23 2015-07-11 21:53:36

解决方案3
11 2009-10-23 20:54:09

解决方案4
4 2014-08-17 14:03:48

解决方案5
2 2009-10-23 21:28:55

解决方案6
2 2016-09-06 15:52:51

解决方案7
1 2012-09-24 10:50:49

解决方案8
1 2009-10-23 22:08:21

解决方案9
0 2009-10-23 20:20:57