Since I'm pretty much forced to replace some unicode characters in my string returned by some OCR technology the only way I found to do it is replace them "one by one".<\/i>
def recode(mystr):
mystr = mystr.replace(r'\u0104', '\u0104')
mystr = mystr.replace(r'\u017c', '\u017c')
mystr = mystr.replace(r'\u0106' , '\u0106')
...
...
mystr = mystr.replace(r'\u017a' , '\u017a')
mystr = mystr.replace(r'\u017c' , '\u017c')
return mystr
So the reason why foo<\/code> is not read as raw text is that the
r<\/code> in front of a string only plays a role when the string is created<\/em> - afterwards it will act as a normal string - for example when the
%<\/code> -operator is applied.<\/i>
bar = r"\u0104"
mystr = mystr.replace(bar, chr(int(bar[2:], 16)))
This is an XY problem.<\/i>
def recode(mystr):
mystr = mystr.replace(r'\u0104', '\u0104')
mystr = mystr.replace(r'\u017c', '\u017c')
mystr = mystr.replace(r'\u0106' , '\u0106')
mystr = mystr.replace(r'\u017a' , '\u017a')
mystr = mystr.replace(r'\u017c' , '\u017c')
return mystr
def recode2(s):
return s.encode('latin1').decode('unicode_escape')
s = r'\u0104\u017c\u0106\u017a\u017c'
print(s)
print(recode(s))
print(recode2(s))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.