使用Pandas，Python将数据附加到HDF5文件

Question

I have large pandas DataFrames with financial data. 我有大型pandas DataFrames与财务数据。 I have no problem appending and concatenating additional columns and DataFrames to my .h5 file. 我没有问题附加和连接其他列和DataFrames到我的.h5文件。

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute. 财务数据每分钟都在更新，我需要每分钟向我的.h5文件中的所有现有表附加一行数据。

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data. 这是我到目前为止所尝试的内容，但无论我做什么，它都会覆盖.h5文件而不只是附加数据。

HDFStore way: HDFStore方式：

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])

The other way I have tried it, to_hdf: 我试过的另一种方法是to_hdf：

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf: 以下是read_hdf之后的一个DataFrame：

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. 下次我获取数据（每分钟）时，我想将它的一行添加到我所有列的索引5中......然后是6和7 ..依此类推，无需读取和操作整个文件因为这样会破坏这样做的记忆。 If there is a better way of solving this, do not be shy to recommend it. 如果有更好的解决方法，请不要害羞地推荐它。

PS sorry for the formatting of that table in here PS抱歉在这里格式化该表

Answer 1

pandas.HDFStore.put() has parameter append (which defaults to False ) - that instructs Pandas to overwrite instead of appending. pandas.HDFStore.put（）有参数append （默认为False ） - 指示Pandas覆盖而不是追加。

So try this: 试试这个：

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True) , but this file should also be created in a table format: 我们也可以使用store.put(..., append=True) ，但是这个文件也应该以表格格式创建：

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE: appending works only for the table ( format='t' - is an alias for format='table' ) format. 注意：追加仅适用于table （ format='t' - 是format='table'的别名）格式。

Answer 2

tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=True to specify that the data is to be appended to existing data if found under that key, instead of over-writing it. 您需要传递另一个参数append=True以指定将数据附加到现有数据（如果在该键下找到），而不是覆盖它。

Without this, the default is False and if it encounters an existing table under 'this_is_a_key' then it overwrites. 如果没有这个，默认值为False ，如果它遇到'this_is_a_key'下的现有表，则覆盖它。

The mode= argument is only at file-level, telling whether the file as a whole is to be overwritten or appended. mode=参数仅在文件级别，告知是否要覆盖或追加整个文件。

One file can have any number of keys, so a mode='a', append=False setting will mean only one key gets over-written while the other keys stay. 一个文件可以有任意数量的键，因此mode='a', append=False设置意味着只有一个键被覆盖而其他键保持不变。

I had a similar experience as yours and found the additional append argument in the reference doc. 我有与您相似的经历，并在参考文档中找到了额外的附加参数。 After setting it, now it's appending properly for me. 设置完成后，现在它正好适合我。

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html 参考： https ： //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. 注意：hdf5不会为数据帧的索引做任何事情。 We need to iron those out before putting the data in or when we take it out. 我们需要在将数据放入或放出数据之前对其进行处理。

使用Pandas，Python将数据附加到HDF5文件

问题描述

2 个解决方案

解决方案1
8 已采纳 2017-09-13 20:32:57

解决方案2
2 2018-05-22 06:28:15

使用Pandas，Python将数据附加到HDF5文件

问题描述

2 个解决方案

解决方案1 8 已采纳 2017-09-13 20:32:57

解决方案2 2 2018-05-22 06:28:15

解决方案1
8 已采纳 2017-09-13 20:32:57

解决方案2
2 2018-05-22 06:28:15