简体   繁体   English

使用Pandas,Python将数据附加到HDF5文件

[英]Append data to HDF5 file with Pandas, Python

I have large pandas DataFrames with financial data. 我有大型pandas DataFrames与财务数据。 I have no problem appending and concatenating additional columns and DataFrames to my .h5 file. 我没有问题附加和连接其他列和DataFrames到我的.h5文件。

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute. 财务数据每分钟都在更新,我需要每分钟向我的.h5文件中的所有现有表附加一行数据。

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data. 这是我到目前为止所尝试的内容,但无论我做什么,它都会覆盖.h5文件而不只是附加数据。

HDFStore way: HDFStore方式:

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])    

The other way I have tried it, to_hdf: 我试过的另一种方法是to_hdf:

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf: 以下是read_hdf之后的一个DataFrame:

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333  

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. 下次我获取数据(每分钟)时,我想将它的一行添加到我所有列的索引5中......然后是6和7 ..依此类推,无需读取和操作整个文件因为这样会破坏这样做的记忆。 If there is a better way of solving this, do not be shy to recommend it. 如果有更好的解决方法,请不要害羞地推荐它。

PS sorry for the formatting of that table in here PS抱歉在这里格式化该表

pandas.HDFStore.put() has parameter append (which defaults to False ) - that instructs Pandas to overwrite instead of appending. pandas.HDFStore.put()有参数append (默认为False ) - 指示Pandas覆盖而不是追加。

So try this: 试试这个:

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True) , but this file should also be created in a table format: 我们也可以使用store.put(..., append=True) ,但是这个文件也应该以表格格式创建:

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE: appending works only for the table ( format='t' - is an alias for format='table' ) format. 注意:追加仅适用于tableformat='t' - 是format='table'的别名)格式。

tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=True to specify that the data is to be appended to existing data if found under that key, instead of over-writing it. 您需要传递另一个参数append=True以指定将数据附加到现有数据(如果在该键下找到),而不是覆盖它。

Without this, the default is False and if it encounters an existing table under 'this_is_a_key' then it overwrites. 如果没有这个,默认值为False ,如果它遇到'this_is_a_key'下的现有表,则覆盖它。

The mode= argument is only at file-level, telling whether the file as a whole is to be overwritten or appended. mode=参数仅在文件级别,告知是否要覆盖或追加整个文件。

One file can have any number of keys, so a mode='a', append=False setting will mean only one key gets over-written while the other keys stay. 一个文件可以有任意数量的键,因此mode='a', append=False设置意味着只有一个键被覆盖而其他键保持不变。

I had a similar experience as yours and found the additional append argument in the reference doc. 我有与您相似的经历,并在参考文档中找到了额外的附加参数。 After setting it, now it's appending properly for me. 设置完成后,现在它正好适合我。

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html 参考: https//pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. 注意:hdf5不会为数据帧的索引做任何事情。 We need to iron those out before putting the data in or when we take it out. 我们需要在将数据放入或放出数据之前对其进行处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM