简体   繁体   English

为什么有些表情符号没有转换回它们的表示形式?

[英]Why some emojis are not converted back into their representation?

I am working on emoji detection module.我正在研究表情符号检测模块。 For some emojis I am observing weird behavior that is after converting them to utf-8 encoding they are not converted back to their original representation form.对于某些表情符号,我观察到奇怪的行为,即在将它们转换为 utf-8 编码后,它们没有转换回其原始表示形式。 I need their exact colored representation to be send as API response instead of sending unicode escaped string.我需要将它们的确切颜色表示作为 API 响应发送,而不是发送 unicode 转义字符串。 Any leads?有什么线索吗?

In [1]: x = "example1: 🤭 and example2: 😁 and example3: 🥺" 

In [2]: x.encode('utf8')                                                                                                                                                                                                          
Out[2]: b'example1: \xf0\x9f\xa4\xad and example2: \xf0\x9f\x98\x81 and example3: \xf0\x9f\xa5\xba'

In [3]: x.encode('utf8').decode('utf8')                                                                                                                                                                                           
Out[3]: 'example1: \U0001f92d and example2: 😁 and example3: \U0001f97a'

In [4]: print( x.encode('utf8').decode('utf8')  )                                                                                                                                                                                 
*example1: 🤭 and example2: 😁 and example3: 🥺*

Link Emoji used in example 示例中使用的链接表情符号

Update 1: By this example it must be much clearer to explain.更新1:通过这个例子,它必须更清楚地解释。 Here, two emojis are rendered when I have send unicode escape string, but 3rd exampled failed to convert exact emoji, what to do in such case?在这里,当我发送 unicode 转义字符串时,会呈现两个表情符号,但第三个示例未能转换精确的表情符号,在这种情况下该怎么办?

API 查看代码 使用 Postman 的 API 响应

'\U0001f92d' == '' is True . '\U0001f92d' == ''True It is an escape code but is still the same character...Two ways of display/entry.它是一个转义码,但仍然是同一个字符......两种显示/输入方式。 The former is the repr() of the string, printing calls str() .前者是字符串的repr() ,打印调用str() Example:例子:

>>> s = '🤭'
>>> print(repr(s))
'\U0001f92d'
>>> print(str())
🤭
>>> s
'\U0001f92d'
>>> print(s)
🤭

When Python generates the repr() it uses an escape code representation if it thinks the display can't handle the character.当 Python 生成 repr() 时,如果它认为显示器无法处理字符,它会使用转义码表示。 The content of the string is still the same...the Unicode code point.字符串的内容还是一样的...... Unicode 代码点。

It's a debug feature.这是一个调试功能。 For example, is the white space spaces or tabs?例如,空格是空格还是制表符? The repr() of the string makes it clear by using \t as an escape code.字符串的repr()通过使用\t作为转义码使其清晰。

>>> s = 'a\tb'
>>> print(s)
a       b
>>> s
'a\tb'

As to why an escape code is used for one emoji and not another, it depends on the version of Unicode supported by the version of Python used.至于为什么一个表情符号使用转义码而不是另一个,这取决于所使用的 Python 版本支持的 Unicode 版本。

Pyton 3.8 uses Unicode 9.0, and one of your emoji isn't defined at that version level: Pyton 3.8 使用 Unicode 9.0,并且您的表情符号之一未在该版本级别定义:

>>> import unicodedata as ud
>>> ud.unidata_version
'9.0.0'
>>> ud.name('😁')
'GRINNING FACE WITH SMILING EYES'
>>> ud.name('🤭')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当 Django 查询集转换为列表时,表情符号的 mySQL Unicode 解码错误 - mySQL unicode decode error for emojis when the Django queryset is converted to a list 为什么 .add_reaction 不适用于 unicode 表情符号? - Why is .add_reaction not working with unicode emojis? 在使用utf8时,为什么通过pprint显示某些字符会转换为\\ u表示法? - Why are some characters converted to \u notation when displayed by pprint when using utf8? Python3正则表达式:保留一些表情符号,其余部分丢弃 - Python3 regex: Keep some Emojis, discard the rest 将比较公式的字符串表示形式转换回公式 - Convert string representation of a comparison formula back to formula 无法正确表达某些意大利语单词 - Cant' get correct representation for some Italian words 将转换为String的Array转换回Array - Convert an Array, converted to a String, back to an Array 将字符串转换后的字节转换回字符串 - Convert string converted byte back to string 为什么 wx.TextCtrl.SetStyle 对表情符号处理不当? - Why does wx.TextCtrl.SetStyle mishandle emojis? 为什么client.emojis,较新版本的client.get_all_emojis()在使用Discord的Python API时会返回empy列表? - Why client.emojis, newer version of client.get_all_emojis() returns empy list when using Discord's Python API?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM