使用 numpy arrays 保存字典列表

Question

我有一個數據集組成：

dataset = [{"sample":[numpy array (2048,3) shape], "category":"Cat"}, ....]

列表的每個元素都是一個包含鍵“樣本”的字典，其值是一個具有形狀 (2048,3) 的 numpy 數組，類別是該樣本的 class。 數據集 len 為 8000。

我試圖保存在 JSON 但它說它不能序列化 numpy arrays。

保存此列表的最佳方法是什么？ 我不能使用np.save("file", dataset)因為有字典，我不能使用 JSON 因為有 numpy 數組。 我應該使用 HDF5 嗎？ 如果我必須使用數據集進行機器學習，我應該使用什么格式？ 謝謝！

Answer 1

創建特定於您的數據的示例需要有關列表中字典的更多詳細信息。 我創建了一個示例，假設每個字典都有：

category鍵的唯一值。 該值用於數據集名稱。
您要保存的數組有一個sample鍵。

下面的代碼創建一些數據，使用 h5py package 加載到 HDF5 文件，然后將數據讀回新的字典列表。 這是您解決問題的一個很好的起點。

import numpy as np
import h5py

a0, a1 = 10, 5
arr1 = np.arange(a0*a1).reshape(a0,a1)
arr2 = np.arange(a0*a1,2*a0*a1).reshape(a0,a1)
arr3 = np.arange(2*a0*a1,3*a0*a1).reshape(a0,a1)

dataset = [{"sample":arr1, "category":"Cat"}, 
           {"sample":arr2, "category":"Dog"},
           {"sample":arr3, "category":"Fish"},
           ]

# Create the HDF5 file with "category" as dataset name and "sample" as the data
with h5py.File('SO_73499414.h5', 'w') as h5f:
    for ds_dict in dataset:
        h5f.create_dataset(ds_dict["category"], data=ds_dict["sample"])

# Retrieve the HDF5 data with "category" as dataset name and "sample" as the data
ds_list = []
with h5py.File('SO_73499414.h5', 'r') as h5f:
    for ds_name in h5f:
        print(ds_name,'\n',h5f[ds_name]) # prints name and dataset attributes
        print(h5f[ds_name][()]) # prints the dataset values (as an array) 
        # add data and name to list
        ds_list.append({"sample":h5f[ds_name][()], "category":ds_name})

使用 numpy arrays 保存字典列表

問題描述

1 個解決方案

解決方案1
0 2022-08-26 15:53:24

使用 numpy arrays 保存字典列表

問題描述

1 個解決方案

解決方案1 0 2022-08-26 15:53:24

解決方案1
0 2022-08-26 15:53:24