properly converting special chars in python byte string

Question

Tried to look through a few similar threads, but still confused:

I have a byte string with some special characters (for a double quote in my case) like below. What's the easiest way to properly convert it to a string, so that the special characters are mapped correctly?

b = b'My groovy str\xe2\x80\x9d is now fixed'

Update: regarding decode('utf-8')

>>> b = b'My groovy str\xe2\x80\x9d is now fixed'
>>> b_converted = b.decode("utf-8") 
>>> b_converted
'My groovy str\u201d is now fixed'
>>> print(b_converted)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 13: ordinal not in range(128)

Answer 1

The following should work:

b_converted = b.decode("utf-8")

Converted from:

b'My groovy str\xe2\x80\x9d is now fixed'

To:

My groovy str” is now fixed

Answer 2

Use .decode( encoding ) on a byte string to convert it to Unicode.

Encoding can not always be determined and depends on the source. In this case it is clearly utf8 .

Ideally when reading text strings the API used to read the data can specify the encoding or in the case of website requests detect it from response headers, so you don't need to .decode explicitly, for example:

with open('input.txt',encoding='utf8') as file:
    text = file.read()

or

import requests
response = requests.get('http://example.com')
print(response.encoding)
print(response.text) # translated from encoding

properly converting special chars in python byte string

Question

2 answers

solution1
2 2020-07-29 16:00:49

solution2
2 2020-07-29 16:18:00

properly converting special chars in python byte string

Question

2 answers

solution1 2 2020-07-29 16:00:49

solution2 2 2020-07-29 16:18:00

solution1
2 2020-07-29 16:00:49

solution2
2 2020-07-29 16:18:00