简体   繁体   中英

Replacing Unicode character / Python / Django

["

Since I'm pretty much forced to replace some unicode characters in my string returned by some OCR technology the only way I found to do it is replace them "one by one".<\/i>

def recode(mystr):
    mystr = mystr.replace(r'\u0104', '\u0104')
    mystr = mystr.replace(r'\u017c', '\u017c')
    mystr = mystr.replace(r'\u0106' , '\u0106')
    ...
    ...
    mystr = mystr.replace(r'\u017a' , '\u017a')
    mystr = mystr.replace(r'\u017c' , '\u017c')
    return mystr
["

So the reason why foo<\/code> is not read as raw text is that the r<\/code> in front of a string only plays a role when the string is created<\/em> - afterwards it will act as a normal string - for example when the %<\/code> -operator is applied.<\/i>

bar = r"\u0104"
mystr = mystr.replace(bar, chr(int(bar[2:], 16)))
["

This is an XY problem.<\/i>

def recode(mystr):
    mystr = mystr.replace(r'\u0104', '\u0104')
    mystr = mystr.replace(r'\u017c', '\u017c')
    mystr = mystr.replace(r'\u0106' , '\u0106')
    mystr = mystr.replace(r'\u017a' , '\u017a')
    mystr = mystr.replace(r'\u017c' , '\u017c')
    return mystr

def recode2(s):
    return s.encode('latin1').decode('unicode_escape')

s = r'\u0104\u017c\u0106\u017a\u017c'
print(s)
print(recode(s))
print(recode2(s))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM