简体   繁体   中英

Bytes object stored in “repr format” as b'foo' instead of encode()-ing to string — how to fix?

Some hapless coworker saved some data into a file like this:

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(str(s))

when they should have used

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(s.decode())

Now foo.txt looks like

b'The em-dash: \xe2\x80\x94'

Instead of

The em dash: —

I already read this file as a string:

with open('foo.txt') as f:
    bad_foo = f.read()

Now how can I convert bad_foo from the incorrectly-saved format to the correctly-saved string?

You can try literal eval

from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())

If you trust that the input is not malicious , you can use ast.literal_eval on the broken string.

import ast

# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()

Otherwise you will have to manually parse and remove or replace the offending repr'ed escapes.

This code is working correct in my computer. But if you still get error, this may help you

with open('foo.txt', 'r', encoding="utf-8") as f:
    print(f.read())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM