如何在python中正确打印unicode字符列表？

Question

I am trying to search for emoticons in python strings.我正在尝试在 python 字符串中搜索表情符号。 So I have, for example,所以我有，例如，

em_test = ['\U0001f680']
print(em_test)
['🚀']
test = 'This is a test string 💰💰🚀'
if any(x in test for x in em_test):
    print ("yes, the emoticon is there")
else: 
    print ("no, the emoticon is not there")

yes, the emoticon is there

and if a search em_test in如果在 em_test 中搜索

'This is a test string 💰💰🚀' '这是一个测试字符串💰💰🚀'

I can actually find it.我真的可以找到它。

So I have made a csv file with all the emoticons I want defined by their unicode.所以我制作了一个 csv 文件，其中包含我想要由他们的 unicode 定义的所有表情符号。 The CSV looks like this: CSV 如下所示：

\\U0001F600 \\U0001F600

\\U0001F601 \\U0001F601

\\U0001F602 \\U0001F602

\\U0001F923 \\U0001F923

and when I import it and print it I actullay do not get the emoticons but rather just the text representation:当我导入并打印它时，我不会得到表情符号，而只是文本表示：

['\\U0001F600',
 '\\U0001F601',
 '\\U0001F602',
 '\\U0001F923',
...
]

and hence I cannot use this to search for these emoticons in another string... I somehow know that the double backslash \\ is only representation of a single slash but somehow the unicode reader does not get it... I do not know what I'm missing.因此我不能用它来在另一个字符串中搜索这些表情符号......我不知何故知道双反斜杠 \\ 只是一个单斜杠的表示，但不知何故unicode阅读器没有得到它......我不知道我是什么我不见了

Any suggestions?有什么建议么？

Answer 1

You can decode those Unicode escape sequences with .decode('unicode-escape') .您可以使用.decode('unicode-escape')解码这些 Unicode 转义序列。 However, .decode is a bytes method, so if those sequences are text rather than bytes you first need to encode them into bytes.但是， .decode是一种bytes方法，因此如果这些序列是文本而不是字节，您首先需要将它们编码为字节。 Alternatively, you can (probably) open your CSV file in binary mode in order to read those sequences as bytes rather than as text strings.或者，您可以（可能）以二进制模式打开 CSV 文件，以便将这些序列作为bytes而不是文本字符串读取。

Just for fun, I'll also use unicodedata to get the names of those emojis.只是为了好玩，我还将使用unicodedata来获取这些表情符号的名称。

import unicodedata as ud

emojis = [
    '\\U0001F600',
    '\\U0001F601',
    '\\U0001F602',
    '\\U0001F923',
]

for u in emojis:
    s = u.encode('ASCII').decode('unicode-escape')
    print(u, ud.name(s), s)

output输出

\U0001F600 GRINNING FACE 😀
\U0001F601 GRINNING FACE WITH SMILING EYES 😁
\U0001F602 FACE WITH TEARS OF JOY 😂
\U0001F923 ROLLING ON THE FLOOR LAUGHING 🤣

This should be much faster than using ast.literal_eval .这应该比使用ast.literal_eval 。 And if you read the data in binary mode it will be even faster since it avoids the initial decoding step while reading the file, as well as allowing you to eliminate the .encode('ASCII') call.如果您以二进制模式读取数据，它会更快，因为它避免了读取文件时的初始解码步骤，并允许您消除.encode('ASCII')调用。

You can make the decoding a little more robust by using您可以使用

u.encode('Latin1').decode('unicode-escape')

but that shouldn't be necessary for your emoji data.但这对于您的表情符号数据来说不是必需的。 And as I said earlier, it would be even better if you open the file in binary mode to avoid the need to encode it.正如我之前所说，如果您以二进制模式打开文件以避免需要对其进行编码，那就更好了。

Answer 2

1. keeping your csv as-is: 1. 保持你的 csv 原样：

it's a bloated solution, but using ast.literal_eval works:这是一个臃肿的解决方案，但使用ast.literal_eval有效：

import ast

s = '\\U0001F600'

x = ast.literal_eval('"{}"'.format(s))
print(hex(ord(x)))
print(x)

I get 0x1f600 (which is correct char code) and some emoticon character (😀).我得到0x1f600 （这是正确的字符代码）和一些表情符号（😀）。 (well I had to copy/paste a strange char from my console to this answer textfield but that's a console issue by my end, otherwise that works) （好吧，我不得不将一个奇怪的字符从我的控制台复制/粘贴到这个答案文本字段，但这是我最后的控制台问题，否则有效）

just surround with quotes to allow ast to take the input as string.只需用引号括起来就可以让ast将输入作为字符串。

2. using character codes directly 2.直接使用字符代码

maybe you'd be better off by storing the character codes themselves instead of the \\U format:也许通过存储字符代码本身而不是\\U格式会更好：

print(chr(0x1F600))

does exactly the same (so ast is slightly overkill)完全一样（所以ast有点矫枉过正）

your csv could contain:您的 csv 可能包含：

0x1F600
0x1F601
0x1F602
0x1F923

then chr(int(row[0],16)) would do the trick when reading it: example if one 1 row in CSV (or first row)然后chr(int(row[0],16))会在阅读时起作用：例如，如果 CSV 中有 1 行（或第一行）

with open("codes.csv") as f:
   cr = csv.reader(f)
   codes = [int(row[0],16) for row in cr]

如何在python中正确打印unicode字符列表？

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-11-13 12:32:20

解决方案2
1 2017-11-13 12:00:45

如何在python中正确打印unicode字符列表？

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-11-13 12:32:20

解决方案2 1 2017-11-13 12:00:45

解决方案1
3 已采纳 2017-11-13 12:32:20

解决方案2
1 2017-11-13 12:00:45