[英]combining huge h5 files with multiple datasets into one with odo
I have aa number of large (13GB+ in size) h5 files, each h5 file has two datasets that were created with pandas: 我有一些大的(13GB +大小)h5文件,每个h5文件有两个用pandas创建的数据集:
df.to_hdf('name_of_file_to_save', 'key_1',table=True)
df.to_hdf('name_of_file_to_save', 'key_2', table=True) # saved to the same h5 file as above
I've seen a post here: 我在这里看过一篇文章:
Concatenate two big pandas.HDFStore HDF5 files 连接两个大熊猫.HDFStore HDF5文件
on using odo to concatenate h5 files. 使用odo连接h5文件。 What I want to do is for each h5 file that was created, each having key_1
and key_2
, combine them so that all of the key_1
data are in one dataset in the new h5 file and all of the key_2
are in another dataset in the same new h5 file. 我想要做的是为每个创建的h5文件,每个文件都有key_1
和key_2
,将它们组合起来,以便所有key_1
数据都在新h5文件中的一个数据集中,并且所有key_2
都在同一个数据集中新的h5文件。 All of key_1
have the same number of columns, the same applies to key_2
key_1
所有key_1
都具有相同的列数,同样适用于key_2
I've tried this: 我试过这个:
from odo import odo
files = ['file1.h5','file2.h5','file3.h5','file4.h5']
for i in files:
odo('hdfstore://path_to_here_h5_files_live/%s::key_1' % i,
'hdfstore://path_store_new_large_h5::key_1')
Howeever I get an error: 但是我得到一个错误:
(tables/hdf5extension.c:7824)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5A.c", line 259, in H5Acreate2
unable to create attribute
File "H5Aint.c", line 275, in H5A_create
unable to create attribute in object header
File "H5Oattribute.c", line 347, in H5O_attr_create
unable to create new attribute in header
File "H5Omessage.c", line 224, in H5O_msg_append_real
unable to create new message
File "H5Omessage.c", line 1945, in H5O_msg_alloc
unable to allocate space for message
File "H5Oalloc.c", line 1142, in H5O_alloc
object header message is too large
End of HDF5 error back trace
Can't set attribute 'non_index_axes' in node:
/key_1 (Group) ''.
Closing remaining open
For this specific case it was a matter of having too many columns, which exceeded the memory limit allocated for that piece of information. 对于这种特定情况,需要有太多列,这超出了为该条信息分配的内存限制。 The solution is to load the dataframe and transpose it. 解决方案是加载数据帧并转置它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.