pandas transform a csv into a h5 file avoiding memory error

Question

I have this simple code

data = pd.read_csv(file_path + 'PSI_TS_clean.csv', nrows=None, 
                   names=None, usecols=None)

data.to_hdf(file_path + 'PSI_TS_clean.h5', 'table')

but my data is too big and I run into memory issues.

What is a clean way to do this chunk by chunk?

Answer 1

If the csv is really big split the file using a method such as detailed here : chunking-data-from-a-large-file-for-multiprocessing

then iterate through the files and use pd.read_csv on each then use the pd.to_hdf method

for to_hdf check the parameters here: DataFrame.to_hdf you need to ensure mode 'a' and consider append.

Without knowing further detail about the dataframe structure its difficult to comment further.

also for read_csv there is the param: low_memory=False

pandas transform a csv into a h5 file avoiding memory error

Question

1 answers

solution1
0 2015-05-15 23:07:26

pandas transform a csv into a h5 file avoiding memory error

Question

1 answers

solution1 0 2015-05-15 23:07:26

solution1
0 2015-05-15 23:07:26