I am working with a mobile operator that notifies me with some kinda utf-16 encoded string . For example '%u062a%u0633%u062a'
is the equivalent of 'تست'
in Persian. I'm not sure exactly what is the encoding of these strings. How can i convert them to their real form like 'تست'
?
An easy way to do it is to replace %
with \\
to make it a python literal with escaped unicode characters, and then decode it with unicode-escape
.
s = b'%u062a%u0633%u062a'
print(s.replace(b'%', b'\\').decode('unicode-escape'))
You can split the character hex values by %u then lookup the unicode character using built-in function chr
.
def convert_to_unicode(text):
return_str = ''
for character in text.split('%u'):
if character:
chr_code = int(character, 16)
return_str += chr(chr_code)
return return_str
text = '%u062a%u0633%u062a'
print(convert_to_unicode(text))
Output:
تست
Or you can use unicode escape as in another answer by blhsing.
def convert_to_unicode(text: str):
# Replace %.
text = text.replace('%', '\\')
# Escape unicode into character.
text = text.encode().decode('unicode-escape')
return text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.