简体   繁体   中英

How to read h5 file like csv file

I have such an algorithm that works with csv file object

#diplay_id, ad_id, clicked(1 or 0)
colls = {'display_id':np.int32,
         'ad_id':np.int32,
         'clicked':bool}
trainData = pd.read_csv("trainData.csv")

for did, ad, c in trainData.itertuples():
    print did + ad + c #example

But, now I have a '.h5' file, and I want to use it like in the algorithm. And I am reading the file like in the following;

store = pd.HDFStore('data.h5')

But as I know HDFStore returns np arrays. Do you have any idea to use the data file in the algorithm?

The main difference in this case is the fact that HDF5 files might contain multiple DFs/tables, so you always have to specify a key (identifier).

Here is a small demo:

In [14]: fn = r'C:\Temp\test_str.h5'

In [15]: store = pd.HDFStore(fn)

In [16]: store
Out[16]:
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Temp\test_str.h5
/test            frame_table  (typ->appendable,nrows->10000,ncols->4,indexers->[index],dc->[a,c])

In this case only one DF (key= /test ) is stored in this HDF5 file.

Assuming that all your HDF5 files have only one DF (one key per file) you can process them dynamically by choosing the first key:

In [17]: store.keys()
Out[17]: ['/test']

In [18]: key = store.keys()[0]

In [19]: key
Out[19]: '/test'

In [20]: store[key].head()
Out[20]:
        a       b       c                                                txt
0  689347  129498  770470  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
1  954132   97912  783288  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
2   40548  938326  861212  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
3  869895   39293  242473  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
4  938918  487643  362942  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM