将巨大的h5文件与多个数据集合并为一个odo

Question

I have aa number of large (13GB+ in size) h5 files, each h5 file has two datasets that were created with pandas: 我有一些大的（13GB +大小）h5文件，每个h5文件有两个用pandas创建的数据集：

df.to_hdf('name_of_file_to_save', 'key_1',table=True) 
df.to_hdf('name_of_file_to_save', 'key_2', table=True) # saved to the same h5 file as above

I've seen a post here: 我在这里看过一篇文章：

Concatenate two big pandas.HDFStore HDF5 files 连接两个大熊猫.HDFStore HDF5文件

on using odo to concatenate h5 files. 使用odo连接h5文件。 What I want to do is for each h5 file that was created, each having key_1 and key_2 , combine them so that all of the key_1 data are in one dataset in the new h5 file and all of the key_2 are in another dataset in the same new h5 file. 我想要做的是为每个创建的h5文件，每个文件都有key_1和key_2 ，将它们组合起来，以便所有key_1数据都在新h5文件中的一个数据集中，并且所有key_2都在同一个数据集中新的h5文件。 All of key_1 have the same number of columns, the same applies to key_2 key_1所有key_1都具有相同的列数，同样适用于key_2

I've tried this: 我试过这个：

from odo import odo
files = ['file1.h5','file2.h5','file3.h5','file4.h5']
for i in files:
    odo('hdfstore://path_to_here_h5_files_live/%s::key_1' % i,
        'hdfstore://path_store_new_large_h5::key_1')

Howeever I get an error: 但是我得到一个错误：

(tables/hdf5extension.c:7824)
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "H5A.c", line 259, in H5Acreate2
  unable to create attribute
File "H5Aint.c", line 275, in H5A_create
  unable to create attribute in object header
File "H5Oattribute.c", line 347, in H5O_attr_create
  unable to create new attribute in header
File "H5Omessage.c", line 224, in H5O_msg_append_real
  unable to create new message
File "H5Omessage.c", line 1945, in H5O_msg_alloc
  unable to allocate space for message
File "H5Oalloc.c", line 1142, in H5O_alloc
  object header message is too large

End of HDF5 error back trace

Can't set attribute 'non_index_axes' in node:
/key_1 (Group) ''.
Closing remaining open

Answer 1

For this specific case it was a matter of having too many columns, which exceeded the memory limit allocated for that piece of information. 对于这种特定情况，需要有太多列，这超出了为该条信息分配的内存限制。 The solution is to load the dataframe and transpose it. 解决方案是加载数据帧并转置它。

将巨大的h5文件与多个数据集合并为一个odo

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-07-10 03:09:47

将巨大的h5文件与多个数据集合并为一个odo

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-07-10 03:09:47

解决方案1
3 已采纳 2016-07-10 03:09:47