简体   繁体   中英

pandas transform a csv into a h5 file avoiding memory error

I have this simple code

data = pd.read_csv(file_path + 'PSI_TS_clean.csv', nrows=None, 
                   names=None, usecols=None)

data.to_hdf(file_path + 'PSI_TS_clean.h5', 'table')

but my data is too big and I run into memory issues.

What is a clean way to do this chunk by chunk?

If the csv is really big split the file using a method such as detailed here : chunking-data-from-a-large-file-for-multiprocessing

then iterate through the files and use pd.read_csv on each then use the pd.to_hdf method

for to_hdf check the parameters here: DataFrame.to_hdf you need to ensure mode 'a' and consider append.

Without knowing further detail about the dataframe structure its difficult to comment further.

also for read_csv there is the param: low_memory=False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM