简体   繁体   中英

How to replace unicode Chinese characters in Python?

say I have a string like this

example = u"这是一段很蛋疼的中文"

I wanna replace with egg , how can I finish this?

It seems example.replace() is useless. And I tried regex, using re.match(u"蛋", "") returns none.

I searched a lot, it seems I should use method like .decode , but still it doesn't work, even example.replace(u"\蛋", "egg") is useless.

So is there a way to process Chinese characters?

You should get the output as below in Python3 .

>>> import re
>>> example = u"这是一段很蛋疼的中文"
>>> re.search(u'蛋',example)
<_sre.SRE_Match object; span=(5, 6), match='蛋'>

>>> example.replace('蛋','egg')
'这是一段很egg疼的中文'
>>> re.sub('蛋','egg',example)
'这是一段很egg疼的中文'

>>> example.replace(u"\u86CB", "egg")
'这是一段很egg疼的中文'
>>> re.match('.*蛋',example)
<_sre.SRE_Match object; span=(0, 6), match='这是一段很蛋'>

re.match will try to match the string from the beginning, so it will return None in your case.

You can do something like this within Python2 :

Edit: Adding a correct encoded source file that has a coding spec also using unicode literals will solve the issue.

#!/usr/local/bin/python
# -*- coding: utf-8 -*-

example = u"这是一段很蛋疼的中文"
print example.replace(u"这", u"egg")
# Within Python3
# print(example.replace("这", 'egg'))

Output:

egg是一段很蛋疼的中文

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM