简体   繁体   中英

Column Data in CSV file to h5 format

I am trying to convert a CSV file to h5 format file.

I have gone through multiple posts and I have been able to create the h5 file but still unable to pull individual columns from the CSV file and add them to the h5 file, please let me know if there is any solution to this.

Essentially I have four columns in my CSV file with 4000 observations in each column, trying to check if there is any way to directly convert it to h5 or pull individual column data and edit the existing h5 file. Thank you.

import pandas as pd

filename = '/home/test3.h5'

df = pd.DataFrame(np.array([[1, 2], [4, 5]]),
                   columns=['a', 'b'])

print(pd.read_hdf(filename, 'data'))

As specified in the pandas I/O guide, section HDF5 (PyTables) , there are 2 simple functions to store as hdf:

So converting a csv to h5 could be as simple as:

df = pd.read_csv('input_file.csv')
df.to_hdf('output_file.h5', 'data')

If you want to combine the data

df1 = pd.read_csv('input_file.csv')
df2 = pd.read_hdf('input_file.h5', 'data')
save = pd.merge(df1, df2, on=[...]) # combine data
save.to_hdf('output_file.h5', 'data')

If input_file.h5 and output_file.h5 are the same, mode='w' allows to overwrite the file, using different keys with mode='a' (by default) allows to append to the file, append=True allows to append to the dataframe inside the file, etc.

The guide I linked contains a lot more examples of how to use these tools and also the pd.HDFStore which allows to open the whole file and look into the keys it contains, I suggest you give it a thorough read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM