[英]Why some emojis are not converted back into their representation?
I am working on emoji detection module.我正在研究表情符号检测模块。 For some emojis I am observing weird behavior that is after converting them to utf-8 encoding they are not converted back to their original representation form.
对于某些表情符号,我观察到奇怪的行为,即在将它们转换为 utf-8 编码后,它们没有转换回其原始表示形式。 I need their exact colored representation to be send as API response instead of sending unicode escaped string.
我需要将它们的确切颜色表示作为 API 响应发送,而不是发送 unicode 转义字符串。 Any leads?
有什么线索吗?
In [1]: x = "example1: 🤭 and example2: 😁 and example3: 🥺"
In [2]: x.encode('utf8')
Out[2]: b'example1: \xf0\x9f\xa4\xad and example2: \xf0\x9f\x98\x81 and example3: \xf0\x9f\xa5\xba'
In [3]: x.encode('utf8').decode('utf8')
Out[3]: 'example1: \U0001f92d and example2: 😁 and example3: \U0001f97a'
In [4]: print( x.encode('utf8').decode('utf8') )
*example1: 🤭 and example2: 😁 and example3: 🥺*
Link Emoji used in example 示例中使用的链接表情符号
Update 1: By this example it must be much clearer to explain.更新1:通过这个例子,它必须更清楚地解释。 Here, two emojis are rendered when I have send unicode escape string, but 3rd exampled failed to convert exact emoji, what to do in such case?
在这里,当我发送 unicode 转义字符串时,会呈现两个表情符号,但第三个示例未能转换精确的表情符号,在这种情况下该怎么办?
'\U0001f92d' == ''
is True
. '\U0001f92d' == ''
是True
。 It is an escape code but is still the same character...Two ways of display/entry.它是一个转义码,但仍然是同一个字符......两种显示/输入方式。 The former is the
repr()
of the string, printing calls str()
.前者是字符串的
repr()
,打印调用str()
。 Example:例子:
>>> s = '🤭'
>>> print(repr(s))
'\U0001f92d'
>>> print(str())
🤭
>>> s
'\U0001f92d'
>>> print(s)
🤭
When Python generates the repr() it uses an escape code representation if it thinks the display can't handle the character.当 Python 生成 repr() 时,如果它认为显示器无法处理字符,它会使用转义码表示。 The content of the string is still the same...the Unicode code point.
字符串的内容还是一样的...... Unicode 代码点。
It's a debug feature.这是一个调试功能。 For example, is the white space spaces or tabs?
例如,空格是空格还是制表符? The
repr()
of the string makes it clear by using \t
as an escape code.字符串的
repr()
通过使用\t
作为转义码使其清晰。
>>> s = 'a\tb'
>>> print(s)
a b
>>> s
'a\tb'
As to why an escape code is used for one emoji and not another, it depends on the version of Unicode supported by the version of Python used.至于为什么一个表情符号使用转义码而不是另一个,这取决于所使用的 Python 版本支持的 Unicode 版本。
Pyton 3.8 uses Unicode 9.0, and one of your emoji isn't defined at that version level: Pyton 3.8 使用 Unicode 9.0,并且您的表情符号之一未在该版本级别定义:
>>> import unicodedata as ud
>>> ud.unidata_version
'9.0.0'
>>> ud.name('😁')
'GRINNING FACE WITH SMILING EYES'
>>> ud.name('🤭')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.