Python h5py 虛擬數據集 - 連接/附加，而不是堆棧

Question

我最近開始使用 h5py 在 Python 中使用虛擬數據集 (VDS)。 一切看起來都相當簡單，它肯定避免了數據復制的需要和文件大小的增長。

我見過的大多數示例都類似於下面的示例。

layout = h5py.VirtualLayout(shape=(4, 100), dtype='i4')

for n in range(1, 5):
    filename = "{}.h5".format(n)
    vsource = h5py.VirtualSource(filename, 'data', shape=(100,))
    layout[n - 1] = vsource

# Add virtual dataset to output file
with h5py.File("VDS.h5", 'w', libver='latest') as f:
    f.create_virtual_dataset('data', layout, fillvalue=-5)

他們傾向於采用多個數據源（在這種情況下來自單獨的 hdf5 文件）並創建一個 VDS，其中數據“堆疊”在一起。 我的意思是，它需要四個 arrays 每個 (100,) 大小，並創建一個大小為 (4, 100) 的 VDS。

我希望創建一個大小為 (400,) 的 VDS，本質上將四個 (100,) arrays 端到端連接在一起，在單個 VDS 中。 我該怎么做呢？

Answer 1

在這里你 go，4 個文件，每個文件都有一個shape=(100,)的數據集組合成一個shape=(400,)的虛擬數據集。 訣竅是在 map 虛擬布局的虛擬源時使用切片表示法，如下行所示： layout[n*100:(n+1)*100] = vsource

# Create source files (0.h5 to 3.h5)
a0 = 4
for n in range(a0):
# create some sample data
    arr = (n+1)*np.arange(1,101)
    with h5py.File(f"{n}.h5", "w") as f:
        d = f.create_dataset("data", data=arr)

# Assemble virtual datasets
layout = h5py.VirtualLayout(shape=(a0*100,), dtype="i4")
for n in range(a0):
    vsource = h5py.VirtualSource(f"{n}.h5", "data", shape=(100,))
    layout[n*100:(n+1)*100] = vsource

# Add virtual dataset to VDS file
with h5py.File("VDS.h5", "w") as f:
    f.create_virtual_dataset("vdata", layout, fillvalue=-1)

# read data back
# virtual dataset is transparent for reader!
with h5py.File("VDS.h5", "r") as f:
    print("\nVDS Shape: ", f["vdata"].shape)
    print("\nFirst 10 Elements of Virtual dataset:")
    print(f["vdata"][:10])
    print("Last 10 Elements of Virtual dataset:")
    print(f["vdata"][-10:])

Python h5py 虛擬數據集 - 連接/附加，而不是堆棧

問題描述

1 個解決方案

解決方案1
0 2022-08-12 15:41:48

Python h5py 虛擬數據集 - 連接/附加，而不是堆棧

問題描述

1 個解決方案

解決方案1 0 2022-08-12 15:41:48

解決方案1
0 2022-08-12 15:41:48