How to replace unicode Chinese characters in Python?

Question

say I have a string like this

example = u"这是一段很蛋疼的中文"

I wanna replace 蛋 with egg , how can I finish this?

It seems example.replace() is useless. And I tried regex, using re.match(u"蛋", "") returns none.

I searched a lot, it seems I should use method like .decode , but still it doesn't work, even example.replace(u"\蛋", "egg") is useless.

So is there a way to process Chinese characters?

Answer 1

You should get the output as below in Python3 .

>>> import re
>>> example = u"这是一段很蛋疼的中文"
>>> re.search(u'蛋',example)
<_sre.SRE_Match object; span=(5, 6), match='蛋'>

>>> example.replace('蛋','egg')
'这是一段很egg疼的中文'
>>> re.sub('蛋','egg',example)
'这是一段很egg疼的中文'

>>> example.replace(u"\u86CB", "egg")
'这是一段很egg疼的中文'
>>> re.match('.*蛋',example)
<_sre.SRE_Match object; span=(0, 6), match='这是一段很蛋'>

re.match will try to match the string from the beginning, so it will return None in your case.

Answer 2

You can do something like this within Python2 :

Edit: Adding a correct encoded source file that has a coding spec also using unicode literals will solve the issue.

#!/usr/local/bin/python
# -*- coding: utf-8 -*-

example = u"这是一段很蛋疼的中文"
print example.replace(u"这", u"egg")
# Within Python3
# print(example.replace("这", 'egg'))

Output:

egg是一段很蛋疼的中文

How to replace unicode Chinese characters in Python?

Question

2 answers

solution1
2 ACCPTED 2017-05-29 02:38:10

solution2
1 2017-05-29 02:35:06

How to replace unicode Chinese characters in Python?

Question

2 answers

solution1 2 ACCPTED 2017-05-29 02:38:10

solution2 1 2017-05-29 02:35:06

solution1
2 ACCPTED 2017-05-29 02:38:10

solution2
1 2017-05-29 02:35:06