简体   繁体   中英

Printing strings with UTF-8 encoded characters, e.g.: “\u00c5\u009b\”

I would like to print strings encoded like this one: "Cze\Å\›\Ä\‡" but I have no idea how. The example string should be printed as: "Cześć".

What I have tried is:

str = "Cze\u00c5\u009b\u00c4\u0087"
print(str) 
#gives: CzeÅÄ

str_bytes = str.encode("unicode_escape")
print(str_bytes) 
#gives: b'Cze\\xc5\\x9b\\xc4\\x87'

str = str_bytes.decode("utf8")
print(str) 
#gives: Cze\xc5\x9b\xc4\x87

Where

print(b"Cze\xc5\x9b\xc4\x87".decode("utf8"))

gives "Cześć", but I don't know how to transform the "Cze\\xc5\\x9b\\xc4\\x87" string to the b"Cze\\xc5\\x9b\\xc4\\x87" bytes.

I also know that the problem are additional backslashes in the byte representation after encoding the basis string with "unicode_escape" parameter, but I don't know how to get rid of them - str_bytes.replace(b'\\\\\\\\', b'\\\\') doesn't work.

Use raw_unicode_escape :

text = 'Cze\u00c5\u009b\u00c4\u0087'
text_bytes = text.encode('raw_unicode_escape')
print(text_bytes.decode('utf8')) # outputs Cześć

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM