简体   繁体   English

Python-3.x - 将bytearray的字符串表示形式转换回字符串

[英]Python-3.x - Converting a string representation of a bytearray back to a string

The back-story here is a little verbose, but basically I want to take a string like b'\\x04\\x0e\\x1d' and cast it back into a bytearray. 这里的背景故事有点冗长,但基本上我想采用像b'\\x04\\x0e\\x1d'这样的字符串并将其转换回bytearray。

I am working on a basic implementation of a one time pad, where I take a plaintext A and shared key B to generate a ciphertext C accoring to the equation A⊕B=C . 我正在研究一次性填充的基本实现,其中我采用明文A和共享密钥B来生成符合等式A⊕B=C的密文C Then I reverse the process with the equation C⊕B=A . 然后我用公式C⊕B=A反转过程。

I've already found plenty of python3 functions to encode strings as bytes and then xor the bytes, such as the following: 我已经发现了很多python3函数来将字符串编码为字节然后xor字节,如下所示:

def xor_strings(xs, ys):
    return "".join(chr(ord(x) ^ ord(y)) for x, y in zip(xs, ys)).encode()

A call to xor_strings() then returns a bytearray: xor_strings()调用然后返回一个bytearray:

print( xor_strings("foo", "bar"))

But when I print it to the screen, what I'm shown is actually a string. 但是当我将它打印到屏幕上时,我所展示的实际上是一个字符串。 So I'm assuming that python is just calling some str() function on the bytearray, and I get something that looks like the following: 所以我假设python只是在bytearray上调用一些str()函数,我得到的内容如下所示:

b'\\x04\\x0e\\x1d'

Herein lies the problem. 这就是问题所在。 I want to create a new bytearray from that string. 我想从该字符串创建一个新的bytearray。 Normally I would just call decode() on the bytearray. 通常我会在bytearray上调用decode() But if I enter `b'\\x04\\x0e\\x1d' as input, python sees it as a string, not a bytearray! 但是如果输入'b'\\ x04 \\ x0e \\ x1d'作为输入,python会将其视为字符串,而不是字节数组!

How can I take a string like b'\\x04\\x0e\\x1d' as user input and cast it back into a bytearray? 如何将b'\\x04\\x0e\\x1d'这样的字符串作为用户输入并将其转换回bytearray?

As discussed in the comments, use base64 to send binary data in text form. 如评论中所述,使用base64以文本形式发送二进制数据。

import base64

def xor_strings(xs, ys):
    return "".join(chr(ord(x) ^ ord(y)) for x, y in zip(xs, ys)).encode()

# ciphertext is bytes
ciphertext = xor_strings("foo", "bar")
# >>> b'\x04\x0e\x1d'

# ciphertext_b64 is *still* bytes, but only "safe" ones (in the printable ASCII range)
ciphertext_b64 = base64.encodebytes(ciphertext)
# >>> b'BA4d\n'

Now we can transfer the bytes: 现在我们可以传输字节:

# ...we could interpret them as ASCII and print them somewhere
safe_string = ciphertext_b64.decode('ascii')
# >>> BA4d

# ...or write them to a file (or a network socket)
with open('/tmp/output', 'wb') as f:
    f.write(ciphertext_b64)

And the recipient can retrieve the original message by: 收件人可以通过以下方式检索原始邮件:

# ...reading bytes from a file (or a network socket)
with open('/tmp/output', 'rb') as f:
    ciphertext_b64_2 = f.read()

# ...or by reading bytes from a string
ciphertext_b64_2 = safe_string.encode('ascii')
# >>> b'BA4d\n'

# and finally decoding them into the original nessage
ciphertext_2 = base64.decodestring(ciphertext_b64_2)
# >>> b'\x04\x0e\x1d'

Of course when it comes to writing bytes to a file or to the network, encoding them as base64 first is superfluous. 当然,在将字节写入文件或网络时,首先将它们编码为base64是多余的。 You can write/read the ciphertext directly if it's the only file content. 如果它是唯一的文件内容,您可以直接写/读密文。 Only if the ciphertext it is part of a higher structure (JSON, XML, a config file...) encoding it as base64 becomes necessary again. 只有当密文成为更高结构(JSON,XML,配置文件......)的一部分时,才需要将其编码为base64。

A note on the use of the words "decode" and "encode". 关于使用“解码”和“编码”一词的说明。

  • To encode a string means to turn it from its abstract meaning ("a list of characters") into a storable representation ("a list of bytes"). 字符串进行编码意味着将其从抽象含义(“字符列表”)转换为可存储表示(“字节列表”)。 The exact result of this operation depends on the byte encoding that is being used. 此操作的确切结果取决于正在使用的字节编码。 For example: 例如:

    • ASCII encoding maps one character to one byte (as a trade-off it can't map all characters that can exist in a Python string). ASCII编码将一个字符映射到一个字节(作为权衡,它不能映射Python字符串中可能存在的所有字符)。
    • UTF-8 encoding maps one character to 1-5 bytes, depending on the character. UTF-8编码将一个字符映射到1-5个字节,具体取决于字符。
  • To decode a byte array means turning it from "a list of bytes" back into "a list of characters" again. 解码字节数组意味着再次将其从“字节列表”转回“字符列表”。 This of course requires prior knowledge of what the byte encoding originally was. 这当然需要事先知道字节编码最初是什么。

ciphertext_b64 above is a list of bytes and is represented as b'BA4d\\n' on the Python console. 上面的ciphertext_b64是一个字节列表,在Python控制台上表示为b'BA4d\\n'

Its string equivalent, safe_string , looks very similar 'BA4d\\n' when printed to the console due to the fact that base64 is a sub-set of ASCII. 它的字符串等效, safe_string ,当打印到控制台时看起来非常相似'BA4d\\n' ,因为base64是ASCII的子集。

The data types however are still fundamentally different. 然而,数据类型仍然根本不同。 Don't let the console output deceive you. 不要让控制台输出欺骗你。

Responding to that final question only. 仅回答最后一个问题。

>>> type(b'\x04\x0e\x1d')
<class 'bytes'>
>>> bytearray(b'\x04\x0e\x1d')
bytearray(b'\x04\x0e\x1d')
>>> type(bytearray(b'\x04\x0e\x1d'))
<class 'bytearray'>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM