为什么表情符号有两个不同的utf-8代码？如何从utf-8转换表情符号，在ios中使用NSString？

Question

We have found an issue, that some emoji have two utf-8 codes, such as: 我们发现了一个问题，一些表情符号有两个utf-8代码，例如：

emoji   unicode    utf-8                another utf-8
😁      U+1F601    \xf0\x9f\x98\x81     \xed\xa0\xbd\xed\xb8\x81

But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8. 但是ios语言无法解码其他类型的utf-8，因此当我从utf-8解码字符串时会产生错误。

In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other. 在我找到的所有文档中，我只能为表情符号找到一种类型的utf-8代码，无处找到另一种代码。

Documents i referenced includes: 我引用的文件包括：

emoji code link 表情符号代码链接

whole utf-8 code link 整个utf-8代码链接

But in a web tool bianma , all the two types of utf-8 code can be converted into emoji correctly. 但是在web工具bianma中，所有两种类型的utf-8代码都可以正确转换为表情符号。

So, my question is : 所以，我的问题是：

Why does there have two types of utf-8 codes for one emoji ? 为什么一个表情符号有两种类型的utf-8代码？
Where has a document which includes the two types of utf-8 codes? 哪个文件包含两种类型的utf-8代码？
How to correctly convert string from utf-8, using NSString in ios language? 如何使用ios语言中的NSString正确转换utf-8中的字符串？

Answer 1

0xF0, 0x9F, 0x98, 0x81 0xF0,0x9F，0x98,0x81

Is the correct UTF-8 encoding for U+1F601 😁. 是否为U + 1F601的正确UTF-8编码。

0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81 0xED，0xA0,0xBD，0xED，0xB8,0x81

Is not a valid UTF-8 sequence(*). 不是有效的UTF-8序列（*）。 It should really be rejected; 它应该被拒绝; iOS is correct to do so. iOS是正确的。

This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629 . 这是bianma工具中的一个错误： convertUtf8BytesToUnicodeCodePoints函数对于它接受的输入convertUtf8BytesToUnicodeCodePoints如RFC 3629中的指定算法更宽松。

This happens to return a working string only because the tool is written in JavaScript. 这恰好返回一个工作字符串，因为该工具是用JavaScript编写的。 Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving \?\\xDE01 . 将上述字节序列解码为伪代理代码点序列U + D83D，U + DE01然后使用直接代码点到代码单元映射将其转换为JavaScript字符串，给出\?\\xDE01 。 As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked. 由于这是在UTF-16字符串中编码correct的正确方法，因此它似乎有效。

(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.) （*：它是一个有效的CESU-8序列，但是这种编码只是“与编写错误的历史工具兼容的伪造破坏编码”，通常应该避免。）

You should not usually encounter a sequence like this; 你不应该经常遇到这样的序列; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed. 它通常不值得用餐，除非你有这种格式错误的数据的特定来源，你没有权力得到修复。

Answer 2

这对我在php中工作，用emoji向电报机器人发送消息：

$message_text = " \xf0\x9f\x98\x81 ";

为什么表情符号有两个不同的utf-8代码？如何从utf-8转换表情符号，在ios中使用NSString？

问题描述

2 个解决方案

解决方案1
11 已采纳 2015-12-22 23:03:36

解决方案2
-1 2018-06-12 09:41:25

为什么表情符号有两个不同的utf-8代码？ 如何从utf-8转换表情符号，在ios中使用NSString？

问题描述

2 个解决方案

解决方案1 11 已采纳 2015-12-22 23:03:36

解决方案2 -1 2018-06-12 09:41:25

为什么表情符号有两个不同的utf-8代码？如何从utf-8转换表情符号，在ios中使用NSString？

解决方案1
11 已采纳 2015-12-22 23:03:36

解决方案2
-1 2018-06-12 09:41:25