简体   繁体   中英

how to decode utf-16 with % as delimiter string to the original form in python3?

I am working with a mobile operator that notifies me with some kinda utf-16 encoded string . For example '%u062a%u0633%u062a' is the equivalent of 'تست' in Persian. I'm not sure exactly what is the encoding of these strings. How can i convert them to their real form like 'تست' ?

An easy way to do it is to replace % with \\ to make it a python literal with escaped unicode characters, and then decode it with unicode-escape .

s = b'%u062a%u0633%u062a'
print(s.replace(b'%', b'\\').decode('unicode-escape'))

You can split the character hex values by %u then lookup the unicode character using built-in function chr .

def convert_to_unicode(text):
    return_str = ''
    for character in text.split('%u'):
        if character:
            chr_code = int(character, 16)
            return_str += chr(chr_code)
    return return_str


text = '%u062a%u0633%u062a'
print(convert_to_unicode(text))

Output:

تست

Or you can use unicode escape as in another answer by blhsing.

def convert_to_unicode(text: str):
    # Replace %.
    text = text.replace('%', '\\')
    # Escape unicode into character.
    text = text.encode().decode('unicode-escape')
    return text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM