简体   繁体   English

在 Python 3 中如何将 unicode 代码点打印为 u'\\U...'

[英]In Python 3 how to print unicode codepoint as u'\U…'

For whatever reason, I thought it would be neat to create a table of emoji I'm interested in. First column would be the codepoint, second the emoji, third the name.无论出于何种原因,我认为创建一个我感兴趣的表情符号表会很好。第一列是代码点,第二列是表情符号,第三列是名称。 SOmething along the lines of this web page, but tailored to my use.与此网页类似的内容,但适合我的使用。

Full emoji data 完整的表情数据

Assuming I figure out how to iterate on the codepoints (there are other questions for that or I construct a list of interest) then I will just cycle through the code points such as假设我弄清楚如何迭代代码点(还有其他问题或者我构建了一个感兴趣的列表)然后我将循环遍历代码点,例如

u_str = u'\U0001F001'
u_str = u'\U0001F002'

(generated programmatically of course) (当然以编程方式生成)

and print (in a loop):并打印(循环):

print(u'\U0001F001', u_str, ' ', unicodedata.name(u_str))
print(u'\U0001F002', u_str, ' ', unicodedata.name(u_str))

If there was an ability to use unicodedata and some attribute such as unicodedata.hex_representation then I would just use that, but if there is that attribute in unicodedata, I don't understand the spec to see it.如果有能力使用 unicodedata 和一些属性,如 unicodedata.hex_representation 那么我会使用它,但如果 unicodedata 中有该属性,我不明白看到它的规范。

So in searching for an answer I found this question:所以在寻找答案时,我发现了这个问题:

how-does-one-print-a-unicode-character-code-in-python 怎么做-一个打印-一个-unicode-character-code-in-python

I attempt:我尝试:

>>> print(u_str.encode('raw_unicode_escape'))
b'\\U0001f600'

what I'm looking for is what I put in:我正在寻找的是我放入的内容:

u_str = u'\U0001F600'

Is this possible or is there some other way to achieve the construction of the table?这是可能的还是有其他方法可以实现表格的构建?

Using Python 3.6+:使用 Python 3.6+:

>>> for i in range(0x1f001,0x1f005):
>>>     print(f'U+{i:04X} \\U{i:08X} {chr(i)}')
U+1F001 \U0001F001 🀁
U+1F002 \U0001F002 🀂
U+1F003 \U0001F003 🀃
U+1F004 \U0001F004 🀄
  1. The original representation is gone forever.原始表示永远消失了。 The case and formatting are specified by Python itself.大小写和格式由 Python 本身指定。

  2. You need to decode your bytes back to text.您需要将字节解码回文本。 Try the ascii codec, since that's all raw_unicode_escape will generate.尝试ascii编解码器,因为这就是raw_unicode_escape会生成的全部raw_unicode_escape

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM