简体   繁体   中英

How to get UTF-8 codepoints of C# string?

I have a German string in C#

string s = "Menü";

I would like to get UTF-8 codepoints:

expected result:

\x4D\x65\x6E\xC3\xBC

The expected result has been verified via online UTF-8 encoder/decoder and via Unicode code converter v8.1

I tried a lot of conversion methods but I cannot get the expected result.

UPDATE:

Funny, the problem was not in the source code but in the wrong encoding in the input file :-) These answers helped me a lot.

There's no such thing as "UTF-8 codepoints" - there are UTF-8 code units , or Unicode code points.

In the string Menü, there are 4 code points:

  • U+004D
  • U+0065
  • U+006E
  • U+00FC

For BMP characters (ie those in the range U+0000 to U+FFFF) it's as simple as iterating over the char values in a string. For non-BMP characters that's slightly trickier. StringInfo looks helpful here, but it includes combining characters when iterating over text elements. It's not terribly hard to spot surrogate pairs in a string, but I don't think there's a very simple way of iterating over all the code points in a string.

Finding the UTF-8 code units - ie the UTF-8-encoded representation of a string as bytes, is simple:

byte[] bytes = Encoding.UTF8.GetBytes(text);

That will give you the five bytes you listed in your question: 0x4d, 0x65, 0x6e, 0xc3, 0xbc.

Use Encoding.UTF8 , example below.

        string menu = "Menü";
        Console.WriteLine(menu);

        var utf8 = Encoding.UTF8;
        byte[] utfBytes = utf8.GetBytes(menu);
        foreach(byte b in utfBytes)
        {
            Console.WriteLine("Hex: {0:X}", b);
        }

        string menu2 = utf8.GetString(utfBytes, 0, utfBytes.Length);
        Console.WriteLine(menu2);

You need to explicitly convert:

var utf8 = Encoding.UTF8.GetBytes("Menü");

and utf8 contains 0x4d, 0x65, 0x6e, 0xc3, 0xbc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM