简体   繁体   中英

How to decode text with base64 in Python

I tried to make a text decoder but it would encode the text instead. I tried many other ways but it would say that the text that was meant to be decoded was a string not bytes. The code:

def encode():
    askencode = input("Type something to encode:")
    askencode = askencode.encode("utf-8")
    base64_info_encode = base64.b64encode(askencode)
    print("This is your encoded text:", base64_info_encode)
    print(base64_info_encode.decode("utf-8"))



def decode():
    askdecode = input("Type something to decode:")
    askdecode = askdecode.encode()
    print(askdecode.decode("utf-8"))
    base64_info_decode = base64.decodebytes(askdecode)
    print("This is your decoded text:", base64_info_decode)

The output:

This is your decoded text: b'm!\x95\xb1\xb1\xbc'

Encoded message: Hello

The reason why it is saying that it is expecting a string and not bytes is because of base64.decodebytes(askdecode) , that is - decodebytes is expecting bytes, and you are passing a string.

You can try some of the other methods for decoding provided here: https://docs.python.org/3/library/base64.html .

This is something that confuses many new programmers (Python or otherwise).

In Python, the way to remember it is this: a string ( str ) doesn't have an encoding, it's just a string of characters, like ideal Platonic representations of characters. The string 'A' does not contain an ASCII character or a UTF-8 character, it just contains the letter A.

However, a bytes is just a grouping of bytes, which can be interpreted as encoding some characters. Ie b'A' is a bytes which contains the character 'A' in UTF-8 encoding, because this is the default encoding for Python (unless you changed the default of course).

The .encode() method of a str takes the characters in that string and encodes them into byte sequences given some specific encoding (using utf-8 by default).

The .decode() method of a bytes takes the grouped bytes in the bytes and decodes them into a string of characters given some specific encoding (using utf-8 by default).

That's why 'ä'.encode('ascii') will fail, since there is no encoding for 'ä' in the ASCII character set, but 'ä'.encode('utf-8') works just fine, as there is an encoding for 'ä' in the UTF-8 character set. In fact, you'd be hard pressed to come up with a character that's not in UTF and can still be represented as a character on a modern computer.

Python tries to keep this clear when you print a variable. If you print('A') , Python will write the actual character 'A' to the output. But if you print(b'A') , it will print b'A' , since it doesn't just pick a decoding to turn the bytes into text. You'd have to tell it to print(b'A'.decode()) to get the same result as printing the string directly.

One more thing to keep in mind: since a string is just an ideal series of characters, you can try to encode it into any encoding that has those characters in it. But you can only decode a series of bytes and get the result you expect, if the bytes are actually meaningful in that encoding. That's why, if you want to change characters in a bytes from one encoding to another, you typically decode and then re-encode with the new encoding; it's up to you to know / remember what encoding a bytes has, it's not saved as part of the bytes sequence itself.

For example:

>>> x = 'ä'.encode('cp1252')
>>> x
b'\xe4'
>>> x.decode('cp1252').encode('euc_jp')
b'\x8f\xab\xa3'

As for your question:

    askdecode = askdecode.encode()
    print(askdecode.decode("utf-8"))
    base64_info_decode = base64.decodebytes(askdecode)

Here you assign the result of askdecode.encode() to askdecode , so you should now see that this makes askdecode a bytes .

The second line works, because it's decoded into a string (using the same encoding, since "utf-8" is the default).

But the third line can fails, since base64.decodebytes expects a base64 encoded series of bytes, but you gave it a utf-8 encoded series of bytes and not every utf-8 encoded series of bytes is also a valid series of base64-encoded characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM