Saving specific partitions of Dask DataFrame to parquet

Question

I have this extremely large dataframe (around 5,000,000 rows) and I have split it into 20 dask partitions.

When I try to save this I my python kernel crashes.

Is there a way of saving each partition, one at a time. Or splitting it into 20 variables?

Dask version = 2022.01.1

Distributed version =... (if using)

Parquet engine and version =...

Answer 1

Yes, you can select individual partitions of your dataframe by using the .partitions attribute. For example, this will yield the first partition (still lazy, unless you call compute() ).

ddf.partitions[0]

However, it would be good to know why things are failing. Maybe your partitions are too big, or maybe there are too many. Extra details would help, including your version of dask, since some important defaults changed recently to help with stability.

Saving specific partitions of Dask DataFrame to parquet

Question

1 answers

solution1
0 2022-08-04 15:02:15

Saving specific partitions of Dask DataFrame to parquet

Question

1 answers

solution1 0 2022-08-04 15:02:15

solution1
0 2022-08-04 15:02:15