简体   繁体   中英

Special Unicode Characters are not removed in Python 3

I have a keys list including words. When I make this command:

for key in keys:
  print(key)

I get normal output in terminal.

在此处输入图片说明

but when I print the entire list using print(keys) , I get this output:

在此处输入图片说明

I have tried using key.replace("\‬", '') , key.replace("\\\‬", '') , re.sub(u'\‬', '', key) but none solved the problem. I also tried the solutions here, but none of them worked either:

Replacing a unicode character in a string in Python 3

Removing unicode \… like characters in a string in python2.7

Python removing extra special unicode characters

How can I remove non-ASCII characters but leave periods and spaces using Python?

I scraped this from Google Trends using Beautiful Soup and retrieved text from get_text() Also in the page source of Google Trends Page, the words are listed as follows:

在此处输入图片说明 When I pasted the text here directly from the page source, the text pasted without these unusual symbols.‬‬

You can just strip out the characters using strip .

>>> keys=['\u202cABCD', '\u202cXYZ\u202c']
>>> for key in keys:
...     print(key)
... 
ABCD
XYZ‬
>>> newkeys=[key.strip('\u202c') for key in keys]
>>> print(keys)
['\u202cABCD', '\u202cXYZ\u202c']
>>> print(newkeys)
['ABCD', 'XYZ']
>>> 

Tried 1 of your methods, it does work for me:

>>> keys
['\u202cABCD', '\u202cXYZ\u202c']
>>> newkeys=[]
>>> for key in keys:
...     newkeys += [key.replace('\u202c', '')]
... 
>>> newkeys
['ABCD', 'XYZ']
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM