简体   繁体   中英

Saving specific partitions of Dask DataFrame to parquet

I have this extremely large dataframe (around 5,000,000 rows) and I have split it into 20 dask partitions.

When I try to save this I my python kernel crashes.

Is there a way of saving each partition, one at a time. Or splitting it into 20 variables?

Dask version = 2022.01.1

Distributed version =... (if using)

Parquet engine and version =...

Yes, you can select individual partitions of your dataframe by using the .partitions attribute. For example, this will yield the first partition (still lazy, unless you call compute() ).

ddf.partitions[0]

However, it would be good to know why things are failing. Maybe your partitions are too big, or maybe there are too many. Extra details would help, including your version of dask, since some important defaults changed recently to help with stability.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM