Bytes object stored in “repr format” as b'foo' instead of encode()-ing to string — how to fix?

Question

Some hapless coworker saved some data into a file like this:

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(str(s))

when they should have used

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(s.decode())

Now foo.txt looks like

b'The em-dash: \xe2\x80\x94'

Instead of

The em dash: —

I already read this file as a string:

with open('foo.txt') as f:
    bad_foo = f.read()

Now how can I convert bad_foo from the incorrectly-saved format to the correctly-saved string?

Answer 1

You can try literal eval

from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())

Answer 2

If you trust that the input is not malicious , you can use ast.literal_eval on the broken string.

import ast

# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()

Otherwise you will have to manually parse and remove or replace the offending repr'ed escapes.

Answer 3

This code is working correct in my computer. But if you still get error, this may help you

with open('foo.txt', 'r', encoding="utf-8") as f:
    print(f.read())

Bytes object stored in “repr format” as b'foo' instead of encode()-ing to string — how to fix?

Question

3 answers

solution1
3 2018-12-11 18:34:28

solution2
1 ACCPTED 2018-12-11 18:38:22

solution3
-2 2018-12-11 18:32:50

Bytes object stored in “repr format” as b'foo' instead of encode()-ing to string — how to fix?

Question

3 answers

solution1 3 2018-12-11 18:34:28

solution2 1 ACCPTED 2018-12-11 18:38:22

solution3 -2 2018-12-11 18:32:50

solution1
3 2018-12-11 18:34:28

solution2
1 ACCPTED 2018-12-11 18:38:22

solution3
-2 2018-12-11 18:32:50