简体   繁体   中英

Loading csv and saving HDF5 in Python

I'm trying to import data from a text file (three columns of floats, 65341 rows, delimited by one or more spaces), and save it to an HDF5 file. I'm trying to save them in a table which is a child of three groups, defined by the filename.

So, for a file called 'data_a1_b2_c3.dat' I want a 1x6000 array in /data/a1/b2/c3 (where c3 is the table)

I can create the HDF5 file, and the groups, but creating the table is proving a problem.

This is what I've come up with so far (I've left out the filename parsing and error checking; that works):

import numpy as np
import tables as tb

# load datafile
fname = 'data_a1_b2_c3.dat'
data=np.genfromtxt(fname)
data=data[:,2]  

# Open hdf5 file
h5=tb.openFile("h5file.h5",'a')


gp1 = h5.create_group(h5.root,"data")
gp2 = h5.create_group(gp1,"a1")
gp3 = h5.create_group(gp2,"b2")

t = h5.create_table(gp3,"c3",data,'my data')

That last line throws an error as below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/tables/file.py", line 1067, in create_table
    chunkshape=chunkshape, byteorder=byteorder)
  File "/usr/lib64/python2.7/site-packages/tables/table.py", line 842, in __init__
    descr_from_dtype(nparray.dtype)
  File "/usr/lib64/python2.7/site-packages/tables/description.py", line 759, in descr_from_dtype
    for name in dtype_.names:
TypeError: 'NoneType' object is not iterable

My first thought that it's something to do with my data array. However, I'm new to Python, and the SciPi documentation site is currently down (anyone have a mirror?!) ( http://www.isup.me/http://docs.scipy.org/doc/numpy/ )

The shape of my array looks odd, but the type looks about right. Any thoughts?

>>> data.shape
(65341,)
>>> data.dtype
dtype('float64')

For info, here's the first three rows of the data file I'm importing (only need the third column)

  0.250000000000000       0.250000000000000        584.469683289793     
  0.250000000000000        1.00000000000000        840.153369718130     
  0.250000000000000        2.00000000000000        821.242731813009

For a quick win - you can save your data as an array (which I guess it is - since data is just 1D):

a = h5.create_array(gp3,"c3",data,'my data')

Remember to do a file close as well:

h5.close()

Results in:

数组结果

If you really want to save it as a table, you basically have to remember that tables need to be defined first (in terms of their record structure) and then their values assigned and flushed.

So, just what you were doing except add this to the start:

class Data(tb.IsDescription):
  value = tb.Float32Col()

and then do:

t = h5.create_table(gp3,"c3",Data,'my data')

row = t.row
for d in data:
   row['value'] = d
   row.append()
t.flush()

Results in:

表

Finally, personally I would actually use Pandas for this CSV to HDF5 stuff - far easier to manipulate DataFrames and Series...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM