[英]Save a list of dictionaries with numpy arrays
I have a dataset composed as:我有一个数据集组成:
dataset = [{"sample":[numpy array (2048,3) shape], "category":"Cat"}, ....]
Each element of the list is a dictionary containing a key "sample" and its value is a numpy array that has shape (2048,3) and the category is the class of that sample.列表的每个元素都是一个包含键“样本”的字典,其值是一个具有形状 (2048,3) 的 numpy 数组,类别是该样本的 class。 The dataset len is 8000.数据集 len 为 8000。
I tried to save in JSON but it said it can't serialize numpy arrays.我试图保存在 JSON 但它说它不能序列化 numpy arrays。
What's the best way to save this list?保存此列表的最佳方法是什么? I can't use np.save("file", dataset)
because there is a dictionary and I can't use JSON because there is the numpy array.我不能使用np.save("file", dataset)
因为有字典,我不能使用 JSON 因为有 numpy 数组。 Should I use HDF5?我应该使用 HDF5 吗? What format should I use if I have to use the dataset for machine learning?如果我必须使用数据集进行机器学习,我应该使用什么格式? Thanks!谢谢!
Creating an example specific to your data requires more details about the dictionaries in the list.创建特定于您的数据的示例需要有关列表中字典的更多详细信息。 I created an example that assumes every dictionary has:我创建了一个示例,假设每个字典都有:
category
key. category
键的唯一值。 The value is used for the dataset name.该值用于数据集名称。sample
key with the array you want to save.您要保存的数组有一个sample
键。Code below creates some data, loads to a HDF5 file with h5py package, then reads the data back into a new list of dictionaries.下面的代码创建一些数据,使用 h5py package 加载到 HDF5 文件,然后将数据读回新的字典列表。 It is a good starting point for your problem.这是您解决问题的一个很好的起点。
import numpy as np
import h5py
a0, a1 = 10, 5
arr1 = np.arange(a0*a1).reshape(a0,a1)
arr2 = np.arange(a0*a1,2*a0*a1).reshape(a0,a1)
arr3 = np.arange(2*a0*a1,3*a0*a1).reshape(a0,a1)
dataset = [{"sample":arr1, "category":"Cat"},
{"sample":arr2, "category":"Dog"},
{"sample":arr3, "category":"Fish"},
]
# Create the HDF5 file with "category" as dataset name and "sample" as the data
with h5py.File('SO_73499414.h5', 'w') as h5f:
for ds_dict in dataset:
h5f.create_dataset(ds_dict["category"], data=ds_dict["sample"])
# Retrieve the HDF5 data with "category" as dataset name and "sample" as the data
ds_list = []
with h5py.File('SO_73499414.h5', 'r') as h5f:
for ds_name in h5f:
print(ds_name,'\n',h5f[ds_name]) # prints name and dataset attributes
print(h5f[ds_name][()]) # prints the dataset values (as an array)
# add data and name to list
ds_list.append({"sample":h5f[ds_name][()], "category":ds_name})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.