简体   繁体   中英

properly converting special chars in python byte string

Tried to look through a few similar threads, but still confused:

I have a byte string with some special characters (for a double quote in my case) like below. What's the easiest way to properly convert it to a string, so that the special characters are mapped correctly?

b = b'My groovy str\xe2\x80\x9d is now fixed'

Update: regarding decode('utf-8')

>>> b = b'My groovy str\xe2\x80\x9d is now fixed'
>>> b_converted = b.decode("utf-8") 
>>> b_converted
'My groovy str\u201d is now fixed'
>>> print(b_converted)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 13: ordinal not in range(128)

The following should work:

b_converted = b.decode("utf-8") 

Converted from:

b'My groovy str\xe2\x80\x9d is now fixed'

To:

My groovy str” is now fixed

Use .decode( encoding ) on a byte string to convert it to Unicode.

Encoding can not always be determined and depends on the source. In this case it is clearly utf8 .

Ideally when reading text strings the API used to read the data can specify the encoding or in the case of website requests detect it from response headers, so you don't need to .decode explicitly, for example:

with open('input.txt',encoding='utf8') as file:
    text = file.read()

or

import requests
response = requests.get('http://example.com')
print(response.encoding)
print(response.text) # translated from encoding

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM