[英]How to decode text with base64 in Python
I tried to make a text decoder but it would encode the text instead.我试图制作一个文本解码器,但它会编码文本。 I tried many other ways but it would say that the text that was meant to be decoded was a string not bytes.
我尝试了许多其他方法,但它会说要解码的文本是字符串而不是字节。 The code:
代码:
def encode():
askencode = input("Type something to encode:")
askencode = askencode.encode("utf-8")
base64_info_encode = base64.b64encode(askencode)
print("This is your encoded text:", base64_info_encode)
print(base64_info_encode.decode("utf-8"))
def decode():
askdecode = input("Type something to decode:")
askdecode = askdecode.encode()
print(askdecode.decode("utf-8"))
base64_info_decode = base64.decodebytes(askdecode)
print("This is your decoded text:", base64_info_decode)
The output:输出:
This is your decoded text: b'm!\x95\xb1\xb1\xbc'
Encoded message: Hello编码信息:你好
The reason why it is saying that it is expecting a string and not bytes is because of base64.decodebytes(askdecode)
, that is - decodebytes is expecting bytes, and you are passing a string.之所以说它需要一个字符串而不是字节是因为
base64.decodebytes(askdecode)
,即 - decodebytes 需要字节,而您正在传递一个字符串。
You can try some of the other methods for decoding provided here: https://docs.python.org/3/library/base64.html .您可以尝试此处提供的其他一些解码方法: https : //docs.python.org/3/library/base64.html 。
This is something that confuses many new programmers (Python or otherwise).这让许多新程序员(Python 或其他)感到困惑。
In Python, the way to remember it is this: a string ( str
) doesn't have an encoding, it's just a string of characters, like ideal Platonic representations of characters.在 Python 中,记住它的方法是这样的:字符串 (
str
) 没有编码,它只是一串字符,就像字符的理想柏拉图表示一样。 The string 'A'
does not contain an ASCII character or a UTF-8 character, it just contains the letter A.字符串
'A'
不包含 ASCII 字符或 UTF-8 字符,它只包含字母 A。
However, a bytes
is just a grouping of bytes, which can be interpreted as encoding some characters.但是,一个
bytes
只是一组字节,可以解释为对某些字符进行编码。 Ie b'A'
is a bytes
which contains the character 'A' in UTF-8 encoding, because this is the default encoding for Python (unless you changed the default of course).即
b'A'
是一个bytes
,其中包含 UTF-8 编码中的字符 'A',因为这是 Python 的默认编码(当然除非您更改了默认值)。
The .encode()
method of a str
takes the characters in that string and encodes them into byte sequences given some specific encoding (using utf-8 by default). str
的.encode()
方法获取该字符串中的字符,并将它们编码为给定某些特定编码的字节序列(默认使用 utf-8)。
The .decode()
method of a bytes
takes the grouped bytes in the bytes
and decodes them into a string of characters given some specific encoding (using utf-8 by default).所述
.decode()
一个的方法bytes
取入分组的字节bytes
和(使用UTF-8默认情况下)它们解码成给定的一些特定的编码字符的字符串。
That's why 'ä'.encode('ascii')
will fail, since there is no encoding for 'ä' in the ASCII character set, but 'ä'.encode('utf-8')
works just fine, as there is an encoding for 'ä' in the UTF-8 character set.这就是为什么
'ä'.encode('ascii')
会失败,因为在 ASCII 字符集中没有 'ä' 的编码,但是'ä'.encode('utf-8')
工作得很好,因为有UTF-8 字符集中 'ä' 的编码。 In fact, you'd be hard pressed to come up with a character that's not in UTF and can still be represented as a character on a modern computer.事实上,你很难想出一个不是 UTF 格式的字符,并且仍然可以在现代计算机上表示为一个字符。
Python tries to keep this clear when you print a variable.当你打印一个变量时,Python 试图保持这一点。 If you
print('A')
, Python will write the actual character 'A' to the output.如果您
print('A')
,Python 会将实际字符 'A' 写入输出。 But if you print(b'A')
, it will print b'A'
, since it doesn't just pick a decoding to turn the bytes into text.但是如果你
print(b'A')
,它会打印b'A'
,因为它不只是选择一个解码来将字节转换为文本。 You'd have to tell it to print(b'A'.decode())
to get the same result as printing the string directly.您必须告诉它
print(b'A'.decode())
以获得与直接打印字符串相同的结果。
One more thing to keep in mind: since a string is just an ideal series of characters, you can try to encode it into any encoding that has those characters in it.还有一点要记住:由于字符串只是一系列理想的字符,您可以尝试将其编码为包含这些字符的任何编码。 But you can only decode a series of bytes and get the result you expect, if the bytes are actually meaningful in that encoding.
但是您只能解码一系列字节并获得您期望的结果,如果这些字节在该编码中实际上是有意义的。 That's why, if you want to change characters in a
bytes
from one encoding to another, you typically decode and then re-encode with the new encoding;这就是为什么,如果您想将
bytes
中的bytes
从一种编码更改为另一种编码,通常会先解码,然后使用新编码重新编码; it's up to you to know / remember what encoding a bytes
has, it's not saved as part of the bytes
sequence itself.由您来了解/记住
bytes
具有什么编码,它不会保存为bytes
序列本身的一部分。
For example:例如:
>>> x = 'ä'.encode('cp1252')
>>> x
b'\xe4'
>>> x.decode('cp1252').encode('euc_jp')
b'\x8f\xab\xa3'
As for your question:至于你的问题:
askdecode = askdecode.encode()
print(askdecode.decode("utf-8"))
base64_info_decode = base64.decodebytes(askdecode)
Here you assign the result of askdecode.encode()
to askdecode
, so you should now see that this makes askdecode
a bytes
.在这里,您将
askdecode.encode()
的结果分配给askdecode
,因此您现在应该看到这使askdecode
成为一个bytes
。
The second line works, because it's decoded into a string (using the same encoding, since "utf-8"
is the default).第二行有效,因为它被解码为一个字符串(使用相同的编码,因为
"utf-8"
是默认值)。
But the third line can fails, since base64.decodebytes
expects a base64 encoded series of bytes, but you gave it a utf-8 encoded series of bytes and not every utf-8 encoded series of bytes is also a valid series of base64-encoded characters.但是第三行可能会失败,因为
base64.decodebytes
需要一个 base64 编码的字节序列,但是你给了它一个 utf-8 编码的字节序列,并不是每个 utf-8 编码的字节序列也是一个有效的 base64 编码序列人物。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.