如何使用`np.fromfile`從二進制文件讀取連續數組？

Question

我想用Python讀取二進制文件，其確切布局存儲在二進制文件本身中。

該文件包含一個二維數組序列，每個數組的行和列尺寸在其內容之前存儲為一對整數。 我想連續讀取文件中包含的所有數組。

我知道可以使用f = open("myfile", "rb")和f.read(numberofbytes)此操作，但是這很笨拙，因為隨后我需要將輸出轉換為有意義的數據結構。 我想用numpy的的np.fromfile與自定義dtype ，但還沒有找到一種方法來讀取文件的一部分，離開它打開，然后繼續修改讀取dtype 。

我知道我可以f.seek(numberofbytes, os.SEEK_SET)使用os來f.seek(numberofbytes, os.SEEK_SET)和np.fromfile ，但這將意味着在文件中不必要的跳轉。

簡而言之，我想要MATLAB的fread （或至少類似C ++ ifstream read東西）。

做這個的最好方式是什么？

Answer 1

您可以將打開的文件對象傳遞給np.fromfile ，讀取第一個數組的尺寸，然后讀取數組的內容（再次使用np.fromfile ），並對同一文件中的其他數組重復該過程。

例如：

import numpy as np
import os

def iter_arrays(fname, array_ndim=2, dim_dtype=np.int, array_dtype=np.double):

    with open(fname, 'rb') as f:
        fsize = os.fstat(f.fileno()).st_size

        # while we haven't yet reached the end of the file...
        while f.tell() < fsize:

            # get the dimensions for this array
            dims = np.fromfile(f, dim_dtype, array_ndim)

            # get the array contents
            yield np.fromfile(f, array_dtype, np.prod(dims)).reshape(dims)

用法示例：

# write some random arrays to an example binary file
x = np.random.randn(100, 200)
y = np.random.randn(300, 400)

with open('/tmp/testbin', 'wb') as f:
    np.array(x.shape).tofile(f)
    x.tofile(f)
    np.array(y.shape).tofile(f)
    y.tofile(f)

# read the contents back
x1, y1 = iter_arrays('/tmp/testbin')

# check that they match the input arrays
assert np.allclose(x, x1) and np.allclose(y, y1)

如果陣列很大，可以考慮使用np.memmap與offset=代替參數np.fromfile得到數組的內容作為存儲器映射，而不是將它們載入RAM。

如何使用`np.fromfile`從二進制文件讀取連續數組？

問題描述

1 個解決方案

解決方案1
4 已采納 2015-07-04 00:04:04

如何使用`np.fromfile`從二進制文件讀取連續數組？

問題描述

1 個解決方案

解決方案1 4 已采納 2015-07-04 00:04:04

解決方案1
4 已采納 2015-07-04 00:04:04