简体   繁体   中英

Remove zero width space unicode character from Python string

I have a string in Python like this:

u'\u200cHealth & Fitness'

How can i remove the

\u200c

part from the string ?

You can encode it into ascii and ignore errors:

u'\u200cHealth & Fitness'.encode('ascii', 'ignore')

Output:

'Health & Fitness'

If you have a string that contains Unicode character, like

s = "Airports Council International \u2013 North America"

then you can try:

newString = (s.encode('ascii', 'ignore')).decode("utf-8")

and the output will be:

Airports Council International North America

Upvote if helps :)

I just use replace because I don't need it:

varstring.replace('\u200c', '')

Or in your case:

u'\u200cHealth & Fitness'.replace('\u200c', '')

对我来说以下工作

mystring.encode('ascii', 'ignore').decode('unicode_escape')

In the specific case in the question: that the string is prefixed with a single u'\\200c' character, the solution is as simple as taking a slice that does not include the first character.

original = u'\u200cHealth & Fitness'
fixed = original[1:]

If the leading character may or may not be present, str.lstrip may be used

original = u'\u200cHealth & Fitness'
fixed = original.lstrip(u'\u200c')

The same solutions will work in Python3. From Python 3.9, str.removeprefix is also available

original = u'\u200cHealth & Fitness'
fixed = original.removeprefix(u'\u200c')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM