简体   繁体   中英

special character with chinese characters not substituted in python string

I cannot seem to substitute a ')' or a '(' without causing errors in other strings. ')' and '(' are special characters. Here are two strings "sample(志信达).mbox" and "sample#宋安兴.mbox" . If I use re to substitute the characters,the chinese character suffers a substitution too. Here is the code in python:

# -*- coding: utf-8 -*-
import re
source1='sample(志信达).mbox'
source2='sample#宋安兴.mbox'
newname1=re.sub(r'[\(\);)(]','-',source1)
newname2=re.sub(r'[\(\);)(]','-',source2)
print source1,newname1
print source2,newname2

Here is the result:

sample(志信达).mbox sample---志信达---.mbox
sample#宋安兴.mbox sample#宋?-兴.mbox

Notice that one of the characters is replaced with '?-'

You should use unicode literals (see https://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code ):

# -*- coding: utf-8 -*-
import re
source1 = u'sample(志信达).mbox'
source2 = u'sample#宋安兴.mbox'
newname1 = re.sub(ur'[\(\);)(]','-',source1)
newname2 = re.sub(ur'[\(\);)(]','-',source2)
print source1,newname1
print source2,newname2

result:

sample(志信达).mbox sample-志信达-.mbox
sample#宋安兴.mbox sample#宋安兴.mbox

Also, do not forget to save your .py file in UTF-8 (your IDE may do this automatically or you may have to manually change encoding depending on the text editor you use).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM