np.save生成的文件中的字節偏移是否總是128？

Question

我一直在玩numpy memmaps，我注意到如果生成了一些數據並將其像這樣轉儲到磁盤中：

from os.path import join
import numpy as np
import tempfile

print('Generate dummy data')
N = 4
D = 3
x, y = np.meshgrid(np.arange(0, N), np.arange(0, N))
data = np.ascontiguousarray((np.dstack([x] * D) % 256).astype(np.uint8))

print('Make temp directory')
dpath = tempfile.mkdtemp()
mem_fpath = join(dpath, 'foo.npy')

print('Dump memmap')
np.save(mem_fpath, data)

那么np.memmap和np.load產生的數據是不同的。

file1 = np.memmap(mem_fpath, dtype=data.dtype.name, shape=data.shape,
                  mode='r')
file2 = np.load(mem_fpath)
print('file1 =\n{!r}'.format(file1[0]))
print('file2 =\n{!r}'.format(file2[0]))

導致

file1 =
memmap([[147,  78,  85],
        [ 77,  80,  89],
        [  1,   0, 118],
        [  0, 123,  39]], dtype=uint8)
file2 =
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]], dtype=uint8)

這讓我感到困惑，但是最終我發現我需要將np.memmap中的offset參數設置為128才能起作用：

for i in range(0, 1000):
    file1 = np.memmap(mem_fpath, dtype=data.dtype.name, shape=data.shape,
                      offset=i, mode='r')
    if np.all(file1 == data):
        print('i = {!r}'.format(i))
        break

print('file1 =\n{!r}'.format(file1[0]))

導致執行

i = 128
file1 =
memmap([[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]], dtype=uint8)

我的問題是，這128個數字是哪里來的。 我檢查了np.save文檔，但沒有看到對它的引用。 我也嘗試過修改數據的dtype和形狀，但是我始終發現偏移量是128。我是否可以假設使用np.save保存的任何單個數組將始終具有該128偏移量？ 如果沒有，如何確定偏移量。

我問的原因是因為我發現從磁盤上較大的文件中裁剪小區域的特定用例中，使用np.memmap比np.load快得多。

感謝您的任何幫助！

Answer 1

您所看到的128字節偏移應被視為實現的fl幸。 一個NPY文件頭的長度需要為16的倍數，並且執行當前對准到64個字節，因為16是不夠的，在所有平台上存儲器映射。

128字節是非常常見的標頭大小，因為標頭中的樣板大約需要64個字節，並且大多數數組的格式不夠復雜，無法用超過128字節的標頭來描述它們。 但是，結構化數組很容易導致頭長於128個字節，並且由較舊的NumPy版本或格式的不同實現產生的NPY文件可能具有不同的對齊方式。

np.save生成的文件中的字節偏移是否總是128？

問題描述

1 個解決方案

解決方案1
2 2018-07-12 21:49:05

np.save生成的文件中的字節偏移是否總是128？

問題描述

1 個解決方案

解決方案1 2 2018-07-12 21:49:05

解決方案1
2 2018-07-12 21:49:05