简体   繁体   中英

Unicode characters in string Python

I have Pandas Series with list of games names, for example:

  • \【\戦\艦\】Warship Saga \ウ
  • \⋆Spider Solitaire+
  • \▻CHESS

I want to remove all Unicode characters that are "unprintable" (so desirable outcome supposed to look like this - Warship Saga, Spider Solitare+, CHESS)

I tried to do his data['Name'] = data['Name'].str.encode('ascii').str.decode('ascii') but it didn't help Also just decoding didn't help. data['Name'] = data['Name'].str.decode('ascii') Thank you in advance!

This works for me, in python 3, by adding 'ignore' as a parameter

string = '\u3010\u6226\u8266\u3011Warship Saga \u30a6'
string = string.encode('ascii', 'ignore').decode('ascii')
print(string)

Out:

Warship Saga 

For the whole column:

data['Name'] = data['Name'].str.encode('ascii', 'ignore').str.decode('ascii')

I tried this, let me know if it helps ;)

s= "\u3010\u6226\u8266\u3011Warship Saga \u30a6"
my_string = [chr(i) for i in ([(ord(c)) for c in s])]
for i in my_string:
    if type(i) == str:
        try:
            print(i.encode('utf-8').decode('ascii'))
        except:
            pass

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM