简体   繁体   中英

Python: Read and write binary data

I am aware that there are a lot of almost identical questions, but non seems to really target the general case.

So assume I want to open a file, read it in memory, possibly do some operations on the respective bitstring and write the result back to file.

The following is what seems straightforward to me, but it results in completely different output. Note that for simplicity I only copy the file here:

file = open('INPUT','rb')
data = file.read()
data_16 = data.encode('hex')
data_2 = bin(int(data_16,16))

OUT = open('OUTPUT','wb')

i = 0
while i < len(data_2) / 8:
    byte = int(data_2[i*8 : (i+1)*8], 2)
    OUT.write('%c' % byte)
    i += 1

OUT.close()

I looked at data , data_16 and data_2 . The transformations make sense as far as I can see.

As expected, the output file has exactly the same size in bits as the input file.

EDIT: I considered the possibility that the leading '0b' has to be cut. See the following:

>>> data[:100]
'BMFU"\x00\x00\x00\x00\x006\x00\x00\x00(\x00\x00\x00\xe8\x03\x00\x00\xee\x02\x00\x00\x01\x00\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x12\x0b\x00\x00\x12\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05=o\xce\xf4^\x16\xe0\x80\x92\x00\x00\x00\x01I\x02\x1d\xb5\x81\xcaN\xcb\xb8\x91\xc3\xc6T\xef\xcb\xe1j\x06\xc3;\x0c*\xb9Q\xbc\xff\xf6\xff\xff\xf7\xed\xdf'
>>> data_16[:100]
'424d46552200000000003600000028000000e8030000ee020000010018000000000000000000120b0000120b000000000000'
>>> data_2[:100]
'0b10000100100110101000110010101010010001000000000000000000000000000000000000000000011011000000000000'
>>> data_2[1]
'b'

Maybe the BMFU" part should be cut from data ?

>>> bin(25)
'0b11001'

Note two things:

  1. The "0b" at the beginning. This means that your slicing will be off by 2 bits.

  2. The lack of padding to 8 bits. This will corrupt your data every time unless it happens to mesh up with point 1.

Process the file byte by byte instead of attempting to process it in one big gulp like this. If you find your code too slow then you need to find a faster way of working byte by byte, not switch to an irreparably flawed method such as this one.

You could simply write the data variable back out and you'd have a successful round trip.

But it looks like you intend to work on the file as a string of 0 and 1 characters. Nothing wrong with that (though it's rarely necessary), but your code takes a very roundabout way of converting the data to that form. Instead of building a monster integer and converting it to a bit string, just do so for one byte at a time:

data = file.read()
data_2 = "".join( bin(ord(c))[2:] for c in data ) 

data_2 is now a sequence of zeros and ones. (In a single string, same as you have it; but if you'll be making changes, I'd keep the bitstrings in a list). The reverse conversion is also best done byte by byte:

newdata = "".join(chr(int(byte, 8)) for byte in grouper(long_bitstring, 8, "0"))

This uses the grouper recipe from the itertools documentation .

from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

You can use the struct module to read and write binary data. (Link to the doc here .)

EDIT

Sorry, I was mislead by your title. I've just understand that you write binary data in a text file instead of writing binary data directly.

Ok, thanks to alexis and being aware of Ignacio's warning about the padding, I found a way to do what I wanted to do, that is read data into a binary representation and write a binary representation to file:

def padd(bitstring):
    padding = ''
    for i in range(8-len(bitstring)):
        padding += '0'
    bitstring = padding + bitstring
    return bitstring

file = open('INPUT','rb')
data = file.read()
data_2 = "".join( padd(bin(ord(c))[2:]) for c in data )

OUT = open('OUTPUT','wb')

i = 0
while i < len(data_2) / 8:
    byte = int(data_2[i*8 : (i+1)*8], 2)
    OUT.write('%c' % byte)
    i += 1

OUT.close()

If I did not do it exactly the way proposed by alexis then that is because it did not work. Of course this is terribly slow but now that I can do the simplest thing, I can optimize it further.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM