I have such an algorithm that works with csv file object
#diplay_id, ad_id, clicked(1 or 0)
colls = {'display_id':np.int32,
'ad_id':np.int32,
'clicked':bool}
trainData = pd.read_csv("trainData.csv")
for did, ad, c in trainData.itertuples():
print did + ad + c #example
But, now I have a '.h5' file, and I want to use it like in the algorithm. And I am reading the file like in the following;
store = pd.HDFStore('data.h5')
But as I know HDFStore returns np arrays. Do you have any idea to use the data file in the algorithm?
The main difference in this case is the fact that HDF5 files might contain multiple DFs/tables, so you always have to specify a key (identifier).
Here is a small demo:
In [14]: fn = r'C:\Temp\test_str.h5'
In [15]: store = pd.HDFStore(fn)
In [16]: store
Out[16]:
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Temp\test_str.h5
/test frame_table (typ->appendable,nrows->10000,ncols->4,indexers->[index],dc->[a,c])
In this case only one DF (key= /test
) is stored in this HDF5 file.
Assuming that all your HDF5 files have only one DF (one key per file) you can process them dynamically by choosing the first key:
In [17]: store.keys()
Out[17]: ['/test']
In [18]: key = store.keys()[0]
In [19]: key
Out[19]: '/test'
In [20]: store[key].head()
Out[20]:
a b c txt
0 689347 129498 770470 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
1 954132 97912 783288 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
2 40548 938326 861212 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
3 869895 39293 242473 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
4 938918 487643 362942 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.