简体   繁体   中英

How can I read successive arrays from a binary file using `np.fromfile`?

I want to read a binary file in Python, the exact layout of which is stored in the binary file itself.

The file contains a sequence of two-dimensional arrays, with the row and column dimensions of each array stored as a pair of integers preceding its contents. I want to successively read all of the arrays contained within the file.

I know this can be done with f = open("myfile", "rb") and f.read(numberofbytes) , but this is quite clumsy because I would then need to convert the output into meaningful data structures. I would like to use numpy's np.fromfile with a custom dtype , but have not found a way to read part of the file, leaving it open, and then continue reading with a modified dtype .

I know I can use os to f.seek(numberofbytes, os.SEEK_SET) and np.fromfile multiple times, but this would mean a lot of unnecessary jumping around in the file.

In short, I want MATLAB's fread (or at least something like C++ ifstream read ).

What is the best way to do this?

You can pass an open file object to np.fromfile , read the dimensions of the first array, then read the array contents (again using np.fromfile ), and repeat the process for additional arrays within the same file.

For example:

import numpy as np
import os

def iter_arrays(fname, array_ndim=2, dim_dtype=np.int, array_dtype=np.double):

    with open(fname, 'rb') as f:
        fsize = os.fstat(f.fileno()).st_size

        # while we haven't yet reached the end of the file...
        while f.tell() < fsize:

            # get the dimensions for this array
            dims = np.fromfile(f, dim_dtype, array_ndim)

            # get the array contents
            yield np.fromfile(f, array_dtype, np.prod(dims)).reshape(dims)

Example usage:

# write some random arrays to an example binary file
x = np.random.randn(100, 200)
y = np.random.randn(300, 400)

with open('/tmp/testbin', 'wb') as f:
    np.array(x.shape).tofile(f)
    x.tofile(f)
    np.array(y.shape).tofile(f)
    y.tofile(f)

# read the contents back
x1, y1 = iter_arrays('/tmp/testbin')

# check that they match the input arrays
assert np.allclose(x, x1) and np.allclose(y, y1)

If the arrays are large, you might consider using np.memmap with the offset= parameter in place of np.fromfile to get the contents of the arrays as memory-maps rather than loading them into RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM