Replacing Unicode character / Python / Django

Question

["

Since I'm pretty much forced to replace some unicode characters in my string returned by some OCR technology the only way I found to do it is replace them "one by one".<\/i>

def recode(mystr):
    mystr = mystr.replace(r'\u0104', '\u0104')
    mystr = mystr.replace(r'\u017c', '\u017c')
    mystr = mystr.replace(r'\u0106' , '\u0106')
    ...
    ...
    mystr = mystr.replace(r'\u017a' , '\u017a')
    mystr = mystr.replace(r'\u017c' , '\u017c')
    return mystr

Answer 1

["

So the reason why foo<\/code> is not read as raw text is that the r<\/code> in front of a string only plays a role when the string is created<\/em> - afterwards it will act as a normal string - for example when the %<\/code> -operator is applied.<\/i>

bar = r"\u0104"
mystr = mystr.replace(bar, chr(int(bar[2:], 16)))

Answer 2

["

This is an XY problem.<\/i>

def recode(mystr):
    mystr = mystr.replace(r'\u0104', '\u0104')
    mystr = mystr.replace(r'\u017c', '\u017c')
    mystr = mystr.replace(r'\u0106' , '\u0106')
    mystr = mystr.replace(r'\u017a' , '\u017a')
    mystr = mystr.replace(r'\u017c' , '\u017c')
    return mystr

def recode2(s):
    return s.encode('latin1').decode('unicode_escape')

s = r'\u0104\u017c\u0106\u017a\u017c'
print(s)
print(recode(s))
print(recode2(s))

Replacing Unicode character / Python / Django

Question

2 answers

solution1
0 2022-06-08 13:16:48

solution2
0 2022-06-08 18:26:29

Replacing Unicode character / Python / Django

Question

2 answers

solution1 0 2022-06-08 13:16:48

solution2 0 2022-06-08 18:26:29

solution1
0 2022-06-08 13:16:48

solution2
0 2022-06-08 18:26:29