简体   繁体   中英

How to convert string with bytes value back to bytes?

I have a program where I write the python check_output output on a file. I forgot to set the encoding to "utf-8" and all the outputs are in bytes. I have written these bytes values on a file. What I have now on my files is some string like " b' math \\xf0\\x9d " containing both ASCII and hex. How can I get just ASCII values and convert the hex values such as \\xf0\\x9d to their original values?

To answer this I need a way to convert the string with bytes values back to bytes. In the example below, opt is bytes, temp is a string. How can I convert temp to opt again?

More details: This is the code that I originally wanted to run. what I get in variable opt has hex values. I was hoping by converting it to a string, I would get rid of them but it is not working.

latex = "a+b"
opt = check_output(["latexmlmath", "--quiet", "--cmml=-", latex])
temp = str(opt)
# also tried
temp = str(opt).encode("utf-8")

The opt and temp values are:

b'<?xml version="1.0" encoding="UTF-8"?>\n<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="a+b" display="block">\n  <apply>\n    <plus/>\n    <ci>\xf0\x9d\x91\x8e</ci>\n    <ci>\xf0\x9d\x91\x8f</ci>\n  </apply>\n</math>\n'
b'<?xml version="1.0" encoding="UTF-8"?>\n<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="a+b" display="block">\n  <apply>\n    <plus/>\n    <ci>\xf0\x9d\x91\x8e</ci>\n    <ci>\xf0\x9d\x91\x8f</ci>\n  </apply>\n</math>\n'

You wanted opt.decode('utf-8') ; calling str on a bytes object without a second ( encoding ) argument just gets the repr of the bytes object. If you have data from such a conversion available, you can convert it back to the original bytes object with ast.literal_eval , then perform the intended decode on the result. Example:

import ast

baddata = 'b\'<?xml version="1.0" encoding="UTF-8"?>\\n<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="a+b" display="block">\\n  <apply>\\n    <plus/>\\n    <ci>\\xf0\\x9d\\x91\\x8e</ci>\\n    <ci>\\xf0\\x9d\\x91\\x8f</ci>\\n  </apply>\\n</math>\\n\''
gooddata = ast.literal_eval(baddata).decode('utf-8')
print(gooddata)

outputs:

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="a+b" display="block">
  <apply>
    <plus/>
    <ci>𝑎</ci>
    <ci>𝑏</ci>
  </apply>
</math>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM