简体   繁体   English

pandas,使用pd.to_hdf在h5文件中存储多个数据集

[英]pandas, store multiple datasets in an h5 file with pd.to_hdf

Say I have two dataframes, 说我有两个数据帧,

import pandas as pd
df1 = pd.DataFrame({'col1':[0,2,3,2],'col2':[1,0,0,1]})
df2 = pd.DataFrame({'col12':[0,1,2,1],'col22':[1,1,1,1]})

Now df1.to_hdf('nameoffile.h5', 'key_to_store','w',table=True) successully stores df1 but I want to also store df2 to the same file, but If I try the same method then df1 will just be over written. 现在df1.to_hdf('nameoffile.h5', 'key_to_store','w',table=True)成功存储df1但我想将df2存储到同一个文件中,但如果我尝试相同的方法,那么df1将只是过度书面。 When I try to load it and I check the keys I only see the info of df2 . 当我尝试加载它并检查键时,我只看到df2的信息。 How can I store both df1 and df2 in the same h5 file as a table ? 如何将df1df2存储在与表相同的h5文件中?

You are using 'w' which overwrites, by default the mode is 'a' so you can just do: 您正在使用覆盖的'w' ,默认情况下模式为'a'因此您可以执行以下操作:

df2.to_hdf('nameoffile.h5', 'key_to_store', table=True, mode='a')

Check the docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html#pandas.DataFrame.to_hdf 查看文档: http//pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html#pandas.DataFrame.to_hdf

I have used this in the past without issue: 我过去曾经没有使用过这个问题:

store = pd.HDFStore(path_to_hdf)
store[new_df_name] = df2
store.close()

So in your case you could try: 所以在你的情况下,你可以尝试:

store = pd.HDFStore(path_to_hdf)
store['df1'] = df1
store['df2'] = df2
store.close()

I used this in a system where a user could store layouts for microtiter plate experiments. 我在一个系统中使用它,用户可以存储微量滴定板实验的布局。 The first time they saved a layout the hdf file was created and subsequent layouts could then be appended to the file. 他们第一次保存布局时创建了hdf文件,然后可以将后续布局附加到文件中。

NB I have set pd.set_option('io.hdf.default.format', 'table') at the beginning of my program. 注意我在程序开头设置了pd.set_option('io.hdf.default.format', 'table')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM