简体   繁体   中英

Pickle dump Pandas DataFrame

This is a question from a lazy man.

I have 4 million rows of pandas DataFrame and would like to save them into smaller chunks of pickle files.

Why smaller chunks? To save/load them quicker.

My question is: 1) Is there a better way (in-built function) to save them in smaller pieces than manually chunking them using np.array_split?

2) Is there any graceful way of gluing them together when I read the chunks other than manually gluing them together?

Please Feel free to suggest any other data type suited for this job other than pickle.

If the goal is to save and load quickly you should look into using sql rather than raw text pickling. If your computer chokes when you ask it to write 4 million rows you can specify a chunk size.

From there you can query slices with std. SQL.

I've been using this for a dataframe of size 7,000,000 x 250

Use hdfs DOCUMENTATION

df = pd.DataFrame(np.random.rand(5, 5))
df

在此输入图像描述

df.to_hdf('myrandomstore.h5', 'this_df', append=False, complib='blosc', complevel=9)

new_df = pd.read_hdf('myrandomstore.h5', 'this_df')
new_df

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM