在python中读取二进制文件

Question

I wrote a python script to create a binary file of integers. 我写了一个python脚本来创建一个整数的二进制文件。

import struct  
pos = [7623, 3015, 3231, 3829]  
inh = open('test.bin', 'wb')  
for e in pos:  
    inh.write(struct.pack('i', e))  
inh.close()

It worked well, then I tried to read the 'test.bin' file using the below code. 它工作得很好，然后我尝试使用下面的代码读取'test.bin'文件。

import struct  
inh = open('test.bin', 'rb')  
for rec in inh:  
    pos = struct.unpack('i', rec)  
    print pos  
inh.close()

But it failed with an error message: 但它失败并显示错误消息：

Traceback (most recent call last):   
   File "readbinary.py", line 10, in <module>  
   pos = struct.unpack('i', rec)  
   File "/usr/lib/python2.5/struct.py", line 87, in unpack  
   return o.unpack(s)  
struct.error: unpack requires a string argument of length 4

I would like to know how I can read these file using struct.unpack . 我想知道如何使用struct.unpack读取这些文件。
Many thanks in advance, Vipin 非常感谢，Vipin

Answer 1

for rec in inh: reads one line at a time -- not what you want for a binary file. for rec in inh:读取一行 - 而不是你想要的二进制文件。 Read 4 bytes at a time (with a while loop and inh.read(4) ) instead (or read everything into memory with a single .read() call, then unpack successive 4-byte slices). 改为一次读取4个字节（使用while循环和inh.read(4) ）（或通过单个.read()调用将所有内容读入内存，然后解压缩连续的4字节切片）。 The second approach is simplest and most practical as long as the amount of data involved isn't huge: 第二种方法最简单，最实用，只要涉及的数据量不大：

import struct
with open('test.bin', 'rb') as inh:
    indata = inh.read()
for i in range(0, len(data), 4):
    pos = struct.unpack('i', data[i:i+4])  
    print(pos)

If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative: 如果您确实害怕潜在的大量数据（这将占用比您可用的更多的内存），一个简单的生成器提供了一个优雅的替代方案：

import struct
def by4(f):
    rec = 'x'  # placeholder for the `while`
    while rec:
        rec = f.read(4)
        if rec: yield rec           
with open('test.bin', 'rb') as inh:
    for rec in by4(inh):
        pos = struct.unpack('i', rec)  
        print(pos)

A key advantage to this second approach is that the by4 generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded: 第二种方法的一个关键优势是by4生成器可以轻松调整（同时保持规范：一次返回二进制文件的4个字节数据），以使用不同的实现策略进行缓冲，一直到第一种方法（读取所有内容然后将其包裹起来）这可以被视为“无限缓冲”并编码：

def by4(f):
    data = inf.read()
    for i in range(0, len(data), 4):
        yield data[i:i+4]

while leaving the "application logic" (what to do with that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator). 同时留下了“应用逻辑”（做什么用的4个字节的组块流）完整和独立的I / O层（其被所述发电机内封装的）。

Answer 2

I think "for rec in inh" is supposed to read 'lines', not bytes. 我认为“for in in inh”应该是'lines'而不是字节。 What you want is: 你想要的是：

while True:
    rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
    if len(rec) != 4:
        break
    (pos,) = struct.unpack('i', rec)
    print pos

Or as others have mentioned: 或者正如其他人提到的：

while True:
    try:
        (pos,) = struct.unpack_from('i', inh)
    except (some_exception...):
        break

Answer 3

Check the size of the packed integers: 检查打包整数的大小：

>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']

We see 4-byte strings, it means that reading should be 4 bytes at a time: 我们看到4字节字符串，这意味着一次读取应该是4个字节：

>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>>

This is the original int! 这是原来的int！ Extending into a reading loop is left as an exercise . 延伸到阅读循环留作练习。

Answer 4

You can probably use array as well if you want: 如果需要，您也可以使用array ：

import array  
pos = array.array('i', [7623, 3015, 3231, 3829]) 
inh = open('test.bin', 'wb')  
pos.write(inh)
inh.close()

Then use array.array.fromfile or fromstring to read it back. 然后使用array.array.fromfile或fromstring将其读回。

Answer 5

This function reads all bytes from file 此函数从文件中读取所有字节

def read_binary_file(filename):
try:
    f = open(filename, 'rb')
    n = os.path.getsize(filename)
    data = array.array('B')
    data.read(f, n)
    f.close()
    fsize = data.__len__()
    return (fsize, data)

except IOError:
    return (-1, [])

# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]

if (fsize > 0):
    data = t[1]
    # work with data
else:
    print 'Error reading file'

Answer 6

Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. 你的迭代器一次不读4个字节，所以我觉得它很混乱。 Like SilentGhost mentioned, it'd probably be best to use unpack_from(). 就像SilentGhost提到的那样，最好使用unpack_from（）。

在python中读取二进制文件

问题描述

6 个解决方案

解决方案1
8 2010-02-16 16:47:58

解决方案2
5 2010-02-16 16:46:11

解决方案3
1 2010-02-16 16:51:03

解决方案4
1 2010-02-16 19:23:44

解决方案5
1 2011-03-17 13:27:14

解决方案6
0 2010-02-16 16:45:45

在python中读取二进制文件

问题描述

6 个解决方案

解决方案1 8 2010-02-16 16:47:58

解决方案2 5 2010-02-16 16:46:11

解决方案3 1 2010-02-16 16:51:03

解决方案4 1 2010-02-16 19:23:44

解决方案5 1 2011-03-17 13:27:14

解决方案6 0 2010-02-16 16:45:45

解决方案1
8 2010-02-16 16:47:58

解决方案2
5 2010-02-16 16:46:11

解决方案3
1 2010-02-16 16:51:03

解决方案4
1 2010-02-16 19:23:44

解决方案5
1 2011-03-17 13:27:14

解决方案6
0 2010-02-16 16:45:45