python iterate through binary file without lines

Question

I've got some data in a binary file that I need to parse. The data is separated into chunks of 22 bytes, so I'm trying to generate a list of tuples, each tuple containing 22 values. The file isn't separated into lines though, so I'm having problems figuring out how to iterate through the file and grab the data.

If I do this it works just fine:

nextList = f.read(22)
newList = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList)

where newList contains a tuple of 22 values. However, if I try to apply similar logic to a function that iterates through, it breaks down.

def getAllData():
    listOfAll = []
    nextList = f.read(22)
    while nextList != "":
        listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
        nextList = f.read(22)
    return listOfAll

data = getAllData()

gives me this error:

Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
data = getAllData()
File "<pyshell#26>", line 5, in getAllData
listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
struct.error: unpack requires a bytes object of length 22

I'm fairly new to python so I'm not too sure where I'm going wrong here. I know for sure that the data in the file breaks down evenly into sections of 22 bytes, so it's not a problem there.

Answer 1

Since you reported that it was running when len(nextList) == 0 , this is probably because nextList (which isn't a list..) is an empty bytes object which isn't equal to an empty string object:

>>> b"" == ""
False

and so the condition in your line

while nextList != "":

is never true, even when nextList is empty. That's why using len(nextList) != 22 as a break condition worked, and even

while nextList:

should suffice.

Answer 2

read(22) isn't guaranteed to return a string of length 22. It's contract is to return string of length from anywhere between 0 and 22 (inclusive). A string of length zero indicates there is no more data to be read. In python 3 file objects produce bytes objects instead of str . str and bytes will never be considered equal.

If your file is small-ish then you'd be better off to read the entire file into memory and then split it up into chunks. eg.

listOfAll = []
data = f.read()
for i in range(0, len(data), 22):
   t = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", data[i:i+22])
   listOfAll.append(t)

Otherwise you will need to do something more complicated with checking the amount of data you get back from the read.

def dataiter(f, chunksize=22, buffersize=4096):
    data = b''
    while True:
        newdata = f.read(buffersize)    
        if not newdata: # end of file
            if not data:
                return
            else:
                yield data 
                # or raise error  as 0 < len(data) < chunksize
                # or pad with zeros to chunksize
                return

        data += newdata
        i = 0
        while len(data) - i >= chunksize:
            yield data[i:i+chunksize]
            i += chunksize

        try:
            data = data[i:] # keep remainder of unused data
        except IndexError:
            data = b'' # all data was used

python iterate through binary file without lines

Question

2 answers

solution1
4 ACCPTED 2014-12-17 19:22:16

solution2
0 2014-12-17 19:38:30

python iterate through binary file without lines

Question

2 answers

solution1 4 ACCPTED 2014-12-17 19:22:16

solution2 0 2014-12-17 19:38:30

solution1
4 ACCPTED 2014-12-17 19:22:16

solution2
0 2014-12-17 19:38:30