简体   繁体   中英

Python read binary: how to unpack a large number of numbers?

Basically I want to read a binary file contains a large number of doubles. Not sure how to achieve the following:

N=10000000
fin=open("sth.bin","rb")
data = struct.unpack('dddddd......',fin.read(8*N)) #of course not working, but this is what I want
fin.close()

Iterate over the file, unpacking chunks at a time:

with open("sth.bin", "rb") as f:
    numbers = [
        struct.unpack('d', chunk)[0]
        for chunk in iter(lambda: f.read(8), "")
    ]

There are a bunch of optimizations you could do here — reading larger chunks of the file at a time (4096 bytes is generally ideal) and creating a compiled Struct — but that's the general idea. If performance is especially important, you could also unpack multiple doubles at a time (ex, struct.unpack('d' * 8, chunk) ) to reduce the number of function calls:

numbers = []
struct_4096 = struct.Struct("d" * 4096 / 8)
with open("sth.bin", "rb") as f:
    while True:
        chunk = f.read(4096)
        try:
            numbers.extend(struct_4096.unpack(chunk))
        except struct.error:
            numbers.extend(struct.unpack("d" * len(chunk) / 8))

The format of struct support count, for example following code will unpack 100 doubles:

import struct
struct.unpack("100d", string)

if you are dealing with large number of doubles, I suggest you using numpy:

np.fromfile(f, dtype=float, count=100)

will create a 100 double array from the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM