Some hapless coworker saved some data into a file like this:
s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
f.write(str(s))
when they should have used
s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
f.write(s.decode())
Now foo.txt
looks like
b'The em-dash: \xe2\x80\x94'
Instead of
The em dash: —
I already read this file as a string:
with open('foo.txt') as f:
bad_foo = f.read()
Now how can I convert bad_foo
from the incorrectly-saved format to the correctly-saved string?
You can try literal eval
from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())
If you trust that the input is not malicious , you can use ast.literal_eval
on the broken string.
import ast
# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"
# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)
# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()
Otherwise you will have to manually parse and remove or replace the offending repr'ed escapes.
This code is working correct in my computer. But if you still get error, this may help you
with open('foo.txt', 'r', encoding="utf-8") as f:
print(f.read())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.