Python: Converting Mixed Decoded UTF-8 Characters to Text

Question

Using a RESTful service, I have a Python 3.x script download text data from a vendor and land it to a text file. The data contains text that includes decoded UTF-8 characters. Here's an example of the text I receive:

b'Sample data plus some Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
\xe2\x80\x94\x80\x94\x80\x94\x80\x94 and then more data'

Note that this is stored in a variable, say str_data . I'd like to convert those decoded characters before storing it into a database. When I check type(str_data) I get: <class 'str'> even though it has <class 'byte'> type structure (eg, b'stuff'). I have tried everything I can think of: encode(), decode(), etc. but to no avail. The output I want is this:

Sample data plus some Japanese characters なぜ日銀だけ違うのか？— and then more data

Any help would be great. Thank you.

Update

If it will help, here's how I pulled down the data.

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  f.write(str(resp.content))
  f.close()

If I don't use str() on my write, like so...

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  **f.write(resp.content)**
  f.close()

I get the following...

TypeError: write() argument must be str, not bytes

Answer 1

Some of the bytes in that string are not UTF-8 encoded, that's why you're having trouble. The Japanese characters are though.

>>> import ast
>>> ast.literal_eval(str_data).decode('utf-8', errors='replace')
'Sample data plus some Japanese characters なぜ日銀だけ違うのか？—������ and then more data'

Python: Converting Mixed Decoded UTF-8 Characters to Text

Question

1 answers

solution1
0 2022-06-02 23:42:17

Python: Converting Mixed Decoded UTF-8 Characters to Text

Question

1 answers

solution1 0 2022-06-02 23:42:17

solution1
0 2022-06-02 23:42:17