Printing strings with UTF-8 encoded characters, e.g.: “\u00c5\u009b\”

Question

I would like to print strings encoded like this one: "Cze\Å\\Ä\" but I have no idea how. The example string should be printed as: "Cześć".

What I have tried is:

str = "Cze\u00c5\u009b\u00c4\u0087"
print(str) 
#gives: CzeÅÄ

str_bytes = str.encode("unicode_escape")
print(str_bytes) 
#gives: b'Cze\\xc5\\x9b\\xc4\\x87'

str = str_bytes.decode("utf8")
print(str) 
#gives: Cze\xc5\x9b\xc4\x87

Where

print(b"Cze\xc5\x9b\xc4\x87".decode("utf8"))

gives "Cześć", but I don't know how to transform the "Cze\\xc5\\x9b\\xc4\\x87" string to the b"Cze\\xc5\\x9b\\xc4\\x87" bytes.

I also know that the problem are additional backslashes in the byte representation after encoding the basis string with "unicode_escape" parameter, but I don't know how to get rid of them - str_bytes.replace(b'\\\\\\\\', b'\\\\') doesn't work.

Answer 1

Use raw_unicode_escape :

text = 'Cze\u00c5\u009b\u00c4\u0087'
text_bytes = text.encode('raw_unicode_escape')
print(text_bytes.decode('utf8')) # outputs Cześć

Printing strings with UTF-8 encoded characters, e.g.: “\u00c5\u009b\”

Question

1 answers

solution1
5 ACCPTED 2018-07-11 20:14:03

Printing strings with UTF-8 encoded characters, e.g.: “\u00c5\u009b\”

Question

1 answers

solution1 5 ACCPTED 2018-07-11 20:14:03

solution1
5 ACCPTED 2018-07-11 20:14:03