為什么在向數據集添加 3 個可變長度字符串時 h5py 會拋出錯誤？

Question

我正在嘗試使用包含復合對象的一維數組的 h5py（Python 3）設置和寫入 HDF5 數據集。 每個復合對象由三個可變長度的字符串屬性組成。

     with h5py.File("myfile.hdf5", "a") as file:
         dt = np.dtype([
             ("label", h5py.string_dtype(encoding='utf-8')),
             ("name", h5py.string_dtype(encoding='utf-8')),
             ("id", h5py.string_dtype(encoding='utf-8'))])
         dset = file.require_dataset("initial_data", (50000,), dtype=dt)
         dset[0, "label"] = "foo"

當我運行上面的示例時，最后一行代碼會導致 h5py（或更准確地說是 numpy）拋出一個錯誤：

“無法更改對象數組的數據類型。”

我是否正確理解"foo"的類型不是h5py.string_dtype(encoding='utf-8') ？

怎么來的？ 我該如何解決這個問題？

更新 1：進入_view_is_safe(oldtype, newtype)我可以看到錯誤是從名為_view_is_safe(oldtype, newtype)的內部 numpy 函數拋出的。 在我的情況oldtype是dtype('O')但newtype是dtype([('label', 'O')])其導致引發的誤差。

更新 2：我的問題已在下面成功回答，但為了完整起見，我將鏈接到可能相關的 GH 問題： https : //github.com/h5py/h5py/issues/1921

Answer 1

您將dtype設置為可變長度字符串的元組，因此您將一次設置所有元組。 通過只設置標簽元素，其他兩個元組值不會被設置，因此它們不是字符串類型。

例子：

import h5py
import numpy as np

with h5py.File("myfile.hdf5", "a") as file:
    dt = np.dtype([
        ("label", h5py.string_dtype(encoding='utf-8')),
        ("name", h5py.string_dtype(encoding='utf-8')),
        ("id", h5py.string_dtype(encoding='utf-8'))])
    dset = file.require_dataset("initial_data", (50000,), dtype=dt)

#Add a row of data with a tuple:
    dset[0] = "foo", "bar", "baz"
 
#Add another row of data with a np recarray (1 row):
    npdt = np.dtype([
        ("label", 'S4'),
        ("name", 'S4'),
        ("id", 'S4') ])
    dset[1] = np.array( ("foo1", "bar1", "baz1"), dtype=npdt )
       
#Add 3 rows of data with a np recarray (3 rows built from a list of arrays):
    s1 = np.array( ("A", "B", "C"), dtype='S4' )
    s2 = np.array( ("a", "b", "c"), dtype='S4' )
    s3 = np.array( ("X", "Y", "Z"), dtype='S4' )
    recarr = np.rec.fromarrays([s1, s2, s3], dtype=npdt)
    dset[2:5] = recarr

結果#1：

使用所有 3 種方法的結果：

為什么在向數據集添加 3 個可變長度字符串時 h5py 會拋出錯誤？

問題描述

1 個解決方案

解決方案1
2 已采納 2021-07-18 16:09:50

為什么在向數據集添加 3 個可變長度字符串時 h5py 會拋出錯誤？

問題描述

1 個解決方案

解決方案1 2 已采納 2021-07-18 16:09:50

解決方案1
2 已采納 2021-07-18 16:09:50