简体   繁体   English

将十六进制值转换为utf-8字符

[英]Converting hex value to utf-8 character

I'm using IMAP class to read emails. 我正在使用IMAP类来阅读电子邮件。 When my mail body contains Ö IMAP returns the hex value: =C3=96 . 当我的邮件正文包含ÖIMAP时,返回十六进制值: =C3=96 How do I convert it to an utf-8 Ö? 如何将其转换为utf-8Ö?

I'm thinking something like : 我在想类似的东西:

Encoding enc = Encoding.GetEncoding("UTF-8);
System.Byte[] ch = new System.Byte[1];

ch[0] = System.Convert.ToByte([hex value of Ö], 16);
var decodedItem = enc.GetString(ch);

Where expected value of decodedItem is Ö. 其中,decodedItem的期望值为Ö。 But I don't really know why Ö translates to =C3=96 in IMAP and I can't send that in to ToByte() because =C3=96 isnt a true hex value. 但是我真的不知道为什么Ö在IMAP中会转换为=C3=96 ,并且我无法将其发送到ToByte()因为=C3=96并不是真正的十六进制值。

I've also tried doing this: 我也尝试这样做:

Encoding enc = Encoding.GetEncoding("UTF-8);
System.Byte[] ch = new System.Byte[1];

ch[0] = 214;
var decodedItem = enc.GetString(ch);

But the value in decodedItem is = 但是,decodedItem中的值为=。

That symbol is actually two bytes (0xC3, 0x96), but you're only assigning one, and a different one at that (214 = 0xD6)... 该符号实际上是两个字节(0xC3、0x96),但是您只分配一个字节,而在此分配一个不同的字节(214 = 0xD6)...

Encoding enc = Encoding.GetEncoding("UTF-8");
System.Byte[] ch = { 0xC3, 0x96 };

var decodedItem = enc.GetString(ch);

To clarify a bit further, 0xD6 (214) is actually for Unicode, not UTF-8, and you'd reach it by changing the call and values to match the Unicode value: 为了进一步说明,0xD6(214)实际上是针对Unicode而非UTF-8的,您可以通过更改调用和值以使其与Unicode值匹配来实现:

Encoding enc = Encoding.GetEncoding("Unicode");
System.Byte[] ch = { 0xD6, 0x00 };

http://www.utf8-chartable.de/ U+00D6 Ö c3 96 LATIN CAPITAL LETTER O WITH DIAERESIS http://www.utf8-chartable.de/ U + 00D6Öc3 96带有拉丁字母的拉丁文大写字母O

This means you have to take away the '=' and then convert it to UTF 8 这意味着您必须删除'=',然后将其转换为UTF 8

I hope this helps. 我希望这有帮助。

Greetings Alex 问候亚历克斯

There's no Unicode in most of today's e-mails. 当今大多数电子邮件中都没有Unicode。 In order to arrive to a Unicode text, you have to do the following operations: 为了到达Unicode文本,您必须执行以下操作:

  • Find a textual part of the message. 查找消息的文本部分。 There could be many of them. 可能有很多。 See the BODYSTRUCTURE in RFC 3501. 请参阅RFC 3501中的BODYSTRUCTURE
  • Inspect the MIME headers (or the BODYSTRUCTURE response) to find out the Content-Transfer-Encoding of the part that you're looking at. 检查MIME标头(或BODYSTRUCTURE响应)以查找正在查看的部分的Content-Transfer-Encoding Most common encodings are quoted-printable and base64 . 最常见的编码是quoted-printablebase64 Look at RFC 2045, 2046, 2047 and 2048 for details. 有关详细信息,请参见RFC 2045、2046、2047和2048。
  • Undo the Content-Transfer-Encoding so that you arrive at a bytestream which contains a sequence of bytes. 撤消Content-Transfer-Encoding以便到达包含字节序列的字节流。
  • Look at the Content-Type header, the charset parameter. 查看Content-Type标头,即charset参数。
  • Decode the stream of bytes using a codec/charset/... which you find above. 使用上面找到的编解码器/字符集/ ...解码字节流。
  • Congratulations, you now have your Unicode string. 恭喜,您现在有了Unicode字符串。

Alternatively, use a library which implements these functions in your favorite language/framework. 或者,使用以您喜欢的语言/框架实现这些功能的库。 There are plenty of them. 有很多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM