字符串 Python 中的 Unicode 字符

Question

I have Pandas Series with list of games names, for example:我有带有游戏名称列表的 Pandas 系列，例如：

\【\戦\艦\】Warship Saga \ウ \【\戦\艦\】Warship Saga \ウ
\⋆Spider Solitaire+ \⋆蜘蛛纸牌+
\▻CHESS \▻CHESS

I want to remove all Unicode characters that are "unprintable" (so desirable outcome supposed to look like this - Warship Saga, Spider Solitare+, CHESS)我想删除所有“不可打印”的 Unicode 字符（如此理想的结果应该是这样的 - Warship Saga、Spider Solitare+、CHESS）

I tried to do his data['Name'] = data['Name'].str.encode('ascii').str.decode('ascii') but it didn't help Also just decoding didn't help.我试图做他的 data['Name'] = data['Name'].str.encode('ascii').str.decode('ascii') 但它没有帮助而且只是解码也没有帮助。 data['Name'] = data['Name'].str.decode('ascii') Thank you in advance! data['Name'] = data['Name'].str.decode('ascii') 先谢谢你！

Answer 1

This works for me, in python 3, by adding 'ignore' as a parameter这对我有用，在 python 3 中，通过添加'ignore'作为参数

string = '\u3010\u6226\u8266\u3011Warship Saga \u30a6'
string = string.encode('ascii', 'ignore').decode('ascii')
print(string)

Out:出去：

Warship Saga

For the whole column:对于整列：

data['Name'] = data['Name'].str.encode('ascii', 'ignore').str.decode('ascii')

Answer 2

I tried this, let me know if it helps ;)我试过了，如果有帮助，请告诉我；)

s= "\u3010\u6226\u8266\u3011Warship Saga \u30a6"
my_string = [chr(i) for i in ([(ord(c)) for c in s])]
for i in my_string:
    if type(i) == str:
        try:
            print(i.encode('utf-8').decode('ascii'))
        except:
            pass

字符串 Python 中的 Unicode 字符

问题描述

2 个解决方案

解决方案1
-1 2020-04-08 19:06:20

解决方案2
-1 2020-04-08 19:06:24

字符串 Python 中的 Unicode 字符

问题描述

2 个解决方案

解决方案1 -1 2020-04-08 19:06:20

解决方案2 -1 2020-04-08 19:06:24

解决方案1
-1 2020-04-08 19:06:20

解决方案2
-1 2020-04-08 19:06:24