在python 3中将表情符号转换为Unicode，反之亦然

Question

I am trying to convert an emoji into its Unicode in python 3. For example I would have the emoji 😀 and from this would like to get the corresponding unicode 'U+1F600'.我正在尝试在 python 3 中将表情符号转换为它的 Unicode。例如，我会有表情符号 😀 并且想从中获得相应的 unicode 'U+1F600'。 Similarly I would like to convert the 'U+1F600' back to 😀.同样，我想将 'U+1F600' 转换回😀。 Now I have read the documentation and tried several options but pythons behaviour confuses me here.现在我已经阅读了文档并尝试了几个选项，但是 python 的行为在这里让我感到困惑。

>>> x = '😀'
>>> y = x.encode('utf-8')
>>> y
b'\xf0\x9f\x98\x80'

The emoji is converted to a byte object.表情符号转换为字节对象。

>>> z = y.decode('utf-8')
>>> z
'😀'

Converted the byte object back to the emoji, so far so good.将字节对象转换回表情符号，到目前为止一切顺利。

Now, taking the unicode for the emoji:现在，使用表情符号的 unicode：

>>> c = '\U0001F600'
>>> d = c.encode('utf-8')
>>> d
>>> b'\xf0\x9f\x98\x80'

This prints out the byte encoding again.这将再次打印出字节编码。

>>> d.decode('utf-8')
>>> '😀'

This prints the emoji out again.这会再次打印出表情符号。 I really can't figure out how to convert solely between the Unicode and the emoji.我真的不知道如何仅在 Unicode 和表情符号之间进行转换。

Answer 1

'😀' is already a Unicode object. '😀' 已经是一个 Unicode 对象。 UTF-8 is not Unicode, it's a byte encoding for Unicode. UTF-8 不是 Unicode，它是 Unicode 的字节编码。 To get the codepoint number of a Unicode character, you can use the ord function.要获取 Unicode 字符的代码点编号，可以使用ord函数。 And to print it in the form you want you can format it as hex.并以您想要的形式打印它，您可以将其格式化为十六进制。 Like this:像这样：

s = '😀'
print('U+{:X}'.format(ord(s)))

output输出

U+1F600

If you have Python 3.6+, you can make it even shorter (and more efficient) by using an f-string:如果你有 Python 3.6+，你可以使用 f-string 让它更短（更高效）：

s = '😀'
print(f'U+{ord(s):X}')

BTW, if you want to create a Unicode escape sequence like '\\U0001F600' there's the 'unicode-escape' codec.顺便说一句，如果你想创建一个像'\\U0001F600'这样的 Unicode 转义序列，有一个'unicode-escape'编解码器。 However, it returns a bytes string, and you may wish to convert that back to text.但是，它返回一个bytes字符串，您可能希望将其转换回文本。 You could use the 'UTF-8' codec for that, but you might as well just use the 'ASCII' codec, since it's guaranteed to only contain valid ASCII.您可以为此使用“UTF-8”编解码器，但您也可以只使用“ASCII”编解码器，因为它保证仅包含有效的 ASCII。

s = '😀'
print(s.encode('unicode-escape'))
print(s.encode('unicode-escape').decode('ASCII'))

output输出

b'\\U0001f600'
\U0001f600

I suggest you take a look at this short article by Stack Overflow co-founder Joel Spolsky The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) .我建议你看看 Stack Overflow 联合创始人 Joel Spolsky 的这篇短文，每个软件开发人员绝对、肯定必须了解 Unicode 和字符集（没有借口！）的绝对最小值。

Answer 2

sentence = "Head-Up Displays (HUD)💻 for #automotive🚗 sector\n \nThe #UK-based #startup🚀 Envisics got €42 million #funding💰 from l… "
print("normal sentence - ", sentence)

uc_sentence = sentence.encode('unicode-escape')
print("\n\nunicode represented sentence - ", uc_sentence)

decoded_sentence = uc_sentence.decode('unicode-escape')
print("\n\ndecoded sentence - ", decoded_sentence)

output输出

normal sentence -  Head-Up Displays (HUD)💻 for #automotive🚗 sector
 
The #UK-based #startup🚀 Envisics got €42 million #funding💰 from l… 


unicode represented sentence -  b'Head-Up Displays (HUD)\\U0001f4bb for #automotive\\U0001f697 sector\\n \\nThe #UK-based #startup\\U0001f680 Envisics got \\u20ac42 million #funding\\U0001f4b0 from l\\u2026 '


decoded sentence -  Head-Up Displays (HUD)💻 for #automotive🚗 sector
 
The #UK-based #startup🚀 Envisics got €42 million #funding💰 from l…

在python 3中将表情符号转换为Unicode，反之亦然

问题描述

2 个解决方案

解决方案1
35 已采纳 2017-12-08 14:27:48

解决方案2
2 2020-10-28 04:27:47

在python 3中将表情符号转换为Unicode，反之亦然

问题描述

2 个解决方案

解决方案1 35 已采纳 2017-12-08 14:27:48

解决方案2 2 2020-10-28 04:27:47

解决方案1
35 已采纳 2017-12-08 14:27:48

解决方案2
2 2020-10-28 04:27:47