简体   繁体   English

python遍历没有行的二进制文件

[英]python iterate through binary file without lines

I've got some data in a binary file that I need to parse. 我在二进制文件中有一些数据需要解析。 The data is separated into chunks of 22 bytes, so I'm trying to generate a list of tuples, each tuple containing 22 values. 数据被分成22个字节的块,因此我试图生成一个元组列表,每个元组包含22个值。 The file isn't separated into lines though, so I'm having problems figuring out how to iterate through the file and grab the data. 该文件虽然没有分成几行,所以在确定如何遍历文件和获取数据时遇到了问题。

If I do this it works just fine: 如果我这样做,那就很好了:

nextList = f.read(22)
newList = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList)

where newList contains a tuple of 22 values. 其中newList包含22个值的元组。 However, if I try to apply similar logic to a function that iterates through, it breaks down. 但是,如果我尝试对迭代的函数应用类似的逻辑,则会崩溃。

def getAllData():
    listOfAll = []
    nextList = f.read(22)
    while nextList != "":
        listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
        nextList = f.read(22)
    return listOfAll

data = getAllData()

gives me this error: 给我这个错误:

Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
data = getAllData()
File "<pyshell#26>", line 5, in getAllData
listOfAll.append(struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", nextList))
struct.error: unpack requires a bytes object of length 22

I'm fairly new to python so I'm not too sure where I'm going wrong here. 我是python的新手,所以我不太确定我在哪里出错。 I know for sure that the data in the file breaks down evenly into sections of 22 bytes, so it's not a problem there. 我肯定知道文件中的数据平均分为22个字节,因此这不是问题。

Since you reported that it was running when len(nextList) == 0 , this is probably because nextList (which isn't a list..) is an empty bytes object which isn't equal to an empty string object: 由于您报告说len(nextList) == 0时它正在运行,这可能是因为nextList (不是列表。)是一个空字节对象,它不等于一个空字符串对象:

>>> b"" == ""
False

and so the condition in your line 所以你的状况

while nextList != "":

is never true, even when nextList is empty. 即使nextList为空,也永远不会为真。 That's why using len(nextList) != 22 as a break condition worked, and even 这就是为什么使用len(nextList) != 22作为中断条件的原因,甚至

while nextList:

should suffice. 应该足够了。

read(22) isn't guaranteed to return a string of length 22. It's contract is to return string of length from anywhere between 0 and 22 (inclusive). 不能保证read(22)返回长度为22的字符串。它的约定是返回长度在0到22(含)之间的字符串。 A string of length zero indicates there is no more data to be read. 长度为零的字符串表示没有更多数据要读取。 In python 3 file objects produce bytes objects instead of str . 在python 3文件对象产生bytes对象,而不是str str and bytes will never be considered equal. strbytes永远不会被视为相等。

If your file is small-ish then you'd be better off to read the entire file into memory and then split it up into chunks. 如果您的文件很小,那么最好将整个文件读入内存,然后将其拆分为大块。 eg. 例如。

listOfAll = []
data = f.read()
for i in range(0, len(data), 22):
   t = struct.unpack("BBBBBBBBBBBBBBBBBBBBBB", data[i:i+22])
   listOfAll.append(t)

Otherwise you will need to do something more complicated with checking the amount of data you get back from the read. 否则,您将需要做一些更复杂的事情来检查从读取中获取的数据量。

def dataiter(f, chunksize=22, buffersize=4096):
    data = b''
    while True:
        newdata = f.read(buffersize)    
        if not newdata: # end of file
            if not data:
                return
            else:
                yield data 
                # or raise error  as 0 < len(data) < chunksize
                # or pad with zeros to chunksize
                return

        data += newdata
        i = 0
        while len(data) - i >= chunksize:
            yield data[i:i+chunksize]
            i += chunksize

        try:
            data = data[i:] # keep remainder of unused data
        except IndexError:
            data = b'' # all data was used

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM