简体   繁体   English

将二进制数据转换为文字字符串“ 0”和“ 1”

[英]Turning binary data into a literal string of '0's and '1's

I have a file filled with binary data representing a sequence of 2 byte instructions in big endian ordering. 我有一个充满二进制数据的文件,该文件表示2字节指令的顺序为大端顺序。

I need to be able to decode these instructions into their more meaningful equivalents, but I'm having trouble getting the data into a format I can work with. 我需要能够将这些指令解码为更有意义的等价物,但是我无法将数据转换为可以使用的格式。

I think it would be best If I turned the instructions into actual strings of 0's and 1's. 我认为最好将指令转换为0和1的实际字符串。

So far, I've written this: 到目前为止,我已经写了这个:

 def slurpInstructions(filename):
  instructions = []
  with open(filename, 'rb') as f:
    while True:
      try:
        chunk = f.read(1)
        print(struct.unpack('c', chunk))
      except: 
        break

which prints out the bytes 1 at a time, like this: 一次打印出字节1,如下所示:

(b'\x00',)
(b'a',)

I know the first instruction in the file is: 我知道文件中的第一条指令是:

0000000001100001

So, it looks like it's printing out the ascii chars corresponding to the integer values of each byte, except it's just printing out the hex representation when there's no ascii char for the int value. 因此,看起来它正在打印出与每个字节的整数值相对应的ascii字符,除了它只是在没有int值的ascii char时打印出十六进制表示形式。

Where do I go from here though? 不过我从哪里去呢? I need to turn my b'a' into '1100001' because I actually care about the bits, not the bytes. 我需要将b'a'变成'1100001'因为我实际上关心的是位,而不是字节。

You could convert b'a' to its corresponding integer ord value, and then print the int in binary format using '{:b}'.format : 您可以将b'a'转换为其对应的整数ord值,然后使用'{:b}'.format以二进制格式打印int:

In [6]: '{:b}'.format(ord(b'a'))
Out[6]: '1100001'

  • Reading a large file one-byte-at-a-time can be very slow. 一次读取一个大文件可能很慢。 You'll get better performance by reading more bytes per call to f.read . 通过对f.read每次调用读取更多字节,可以提高性能。 You can iterate over the contents of the file in chunks of 1024 bytes using: 您可以使用以下命令以1024字节的块迭代文件的内容:

     with open(filename, 'rb') as f: for chunk in iter(lambda: f.read(1024), b''): 
  • Similarly, calling print once for each byte can be very slow. 同样,为每个字节调用一次print可能会非常慢。 You'll get better performance by printing more bytes per call to print . 通过每次调用print打印更多字节,您将获得更好的性能。 So you could use a list comprehension to loop over the bytes in chunk , convert each to its string-binary format and then use ''.join to join the strings together: 因此,您可以使用列表推导来遍历chunk的字节,将每个字节转换为其字符串二进制格式,然后使用''.join将字符串连接在一起:

     print(''.join(['{:b}'.format(ord(c)) for c in chunk]), end='') 
  • Use bare except is considered a bad practice . 使用裸except认为是不好的做法 If you choose to use try..except here, list only those Exceptions you wish to handle: 如果您选择在此处使用try..except ,请仅列出您要处理的那些异常:

     try: ... except IOError: 

def slurpInstructions(filename):
    with open(filename, 'rb') as f:
        for chunk in iter(lambda: f.read(1024), b''):
            print(''.join(['{:b}'.format(c) for c in chunk]), end='')

In Python 3, to convert 2 bytes into a bitstring ( '{:b}'.format() may be slightly slower ): 在Python 3中,将2个字节转换为一个位串( '{:b}'.format()可能会稍微慢一些 ):

>>> bin(int.from_bytes(b'\x00a', 'big'))[2:].zfill(16)
'0000000001100001'

For a single-source Python 2/3 compatible version, see Convert binary to ASCII and vice versa 有关与Python 2/3兼容的单源版本,请参阅将二进制转换为ASCII,反之亦然

To load all instructions both time- and space-efficiently , you could use array module : 为了既节省时间又节省空间地加载所有指令,可以使用array模块

#!/usr/bin/env python
import os
import sys
from array import array

instructions = array('H') # each instruction is >=2 bytes   
n = os.path.getsize(filename) // instructions.itemsize # number of instructions
with open(filename, 'rb') as file:
    instructions.fromfile(file, n) # slurp file
if sys.byteorder == 'little':
    instructions.byteswap() # force big-endian order

for h in instructions: # print as bitstrings
    print('{:016b}'.format(h))

For other ways to read a binary file efficiently, see Reading binary file in Python and looping over each byte . 有关有效读取二进制文件的其他方式,请参阅使用Python读取二进制文件并遍历每个字节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM