how to decode utf-16 with % as delimiter string to the original form in python3?

Question

I am working with a mobile operator that notifies me with some kinda utf-16 encoded string . For example '%u062a%u0633%u062a' is the equivalent of 'تست' in Persian. I'm not sure exactly what is the encoding of these strings. How can i convert them to their real form like 'تست' ?

Answer 1

An easy way to do it is to replace % with \\ to make it a python literal with escaped unicode characters, and then decode it with unicode-escape .

s = b'%u062a%u0633%u062a'
print(s.replace(b'%', b'\\').decode('unicode-escape'))

Answer 2

You can split the character hex values by %u then lookup the unicode character using built-in function chr .

def convert_to_unicode(text):
    return_str = ''
    for character in text.split('%u'):
        if character:
            chr_code = int(character, 16)
            return_str += chr(chr_code)
    return return_str


text = '%u062a%u0633%u062a'
print(convert_to_unicode(text))

Output:

تست

Or you can use unicode escape as in another answer by blhsing.

def convert_to_unicode(text: str):
    # Replace %.
    text = text.replace('%', '\\')
    # Escape unicode into character.
    text = text.encode().decode('unicode-escape')
    return text

how to decode utf-16 with % as delimiter string to the original form in python3?

Question

2 answers

solution1
2 ACCPTED 2018-07-06 06:37:08

solution2
2 2018-07-06 06:37:15

how to decode utf-16 with % as delimiter string to the original form in python3?

Question

2 answers

solution1 2 ACCPTED 2018-07-06 06:37:08

solution2 2 2018-07-06 06:37:15

solution1
2 ACCPTED 2018-07-06 06:37:08

solution2
2 2018-07-06 06:37:15