简体   繁体   中英

Base64 encoding issue in Python

I need to save a params file in python and this params file contains some parameters that I won't leave on plain text, so I codify the entire file to base64 (I know that this isn't the most secure encoding of the world but it works for the kind of data that I need to use).

With the encoding, everything works well. I encode the content of my file (a simply txt with a proper extension) and save the file. The problem comes with the decode. I print the text coded before save the file and the text coded from the file saved and there are exactly the same, but for a reason I don't know, the decode of the text of the file saved returns me this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte and the decode of the text before save the file works well.

Any idea to resolve this issue?

This is my code, I have tried converting all to bytes, to string, and everything...

params = open('params.bpr','r').read()


paramsencoded = base64.b64encode(bytes(params,'utf-8'))

print(paramsencoded)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

newparams = open('paramsencoded.bpr','w+',encoding='utf-8')
newparams.write(str(paramsencoded))
newparams.close()

params2 = open('paramsencoded.bpr',encoding='utf-8').read()
print(params2)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

paramsdecoded = base64.b64decode(str(params2))

print(str(paramsdecoded,'utf-8'))

Your error lies in your handling of the bytes object returned by base64.b64encode() , you called str() on the object:

newparams.write(str(paramsencoded))

That doesn't decode the bytes object:

>>> bytesvalue = b'abc='
>>> str(bytesvalue)
"b'abc='"

Note the b'...' notation. You produced the representation of the bytes object, which is a string containing Python syntax that can reproduce the value for debugging purposes (you can copy that string value and paste it into Python to re-create the same bytes value).

This may not be that easy to notice at first, as base64.b64encode() otherwise only produces output with printable ASCII bytes.

But your decoding problem originates from there, because when decoding the value read back from the file includes the b' characters at the start. Those first two characters are interpreted as Base64 data too ; the b is a valid Base64 character, and the ' is ignored by the parser:

>>> bytesvalue = b'hello world'
>>> base64.b64encode(bytesvalue)
b'aGVsbG8gd29ybGQ='
>>> str(base64.b64encode(bytesvalue))
"b'aGVsbG8gd29ybGQ='"
>>> base64.b64decode(str(base64.b64encode(bytesvalue)))  # with str()
b'm\xa1\x95\xb1\xb1\xbc\x81\xdd\xbd\xc9\xb1\x90'
>>> base64.b64decode(base64.b64encode(bytesvalue))       # without str()
b'hello world'

Note how the output is completely different , because the Base64 decoding is now starting from the wrong place, as b is the first 6 bits of the first byte (making the first decoded byte a 6C, 6D, 6E or 6F bytes, so m , n , o or p ASCII).

You could properly decode the value (using paramsencoded.decode('ascii') or str(paramsencoded, 'ascii') ) but you should't treat any of this data as text.

Instead, open your files in binary mode . Reading and writing then operates with bytes objects, and the base64.b64encode() and base64.b64decode() functions also operate on bytes , making for a perfect match:

with open('params.bpr', 'rb') as params_source:
    params = params_source.read()  # bytes object

params_encoded = base64.b64encode(params)
print(params_encoded.decode('ascii'))   # base64 data is always ASCII data

params_decoded = base64.b64decode(params_encoded)

with open('paramsencoded.bpr', 'wb') as new_params:
    newparams.write(params_encoded)  # write binary data

with open('paramsencoded.bpr', 'rb') as new_params:
    params_written = new_params.read()

print(params_written.decode('ascii'))  # still Base64 data, so decode as ASCII

params_decoded = base64.b64decode(params_written)  # decode the bytes value

print(params_decoded.decode('utf8'))  # assuming the original source was UTF-8

I explicitly use bytes.decode(codec) rather than str(..., codec) to avoid accidental str(...) calls.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM