How to concatenate dask Dataframes with datetime index faster?

Question

I have a similar problem as this when concatenating two timestamp-indexed dask Dataframe vertically.

I have two dask dataframes df1,df2:

df1.index:

Dask Index Structure:

npartitions=1

2018-03-03 13:04:44.497929    datetime64[ns]

2018-03-03 13:23:04.759840               ...

Name: time, dtype: datetime64[ns]

Dask Name: getitem, 8 tasks

df2.index:

Dask Index Structure:

npartitions=1

2018-03-03 07:09:04.184453    datetime64[ns]

2018-03-03 07:32:46.815356               ...

Name: time, dtype: datetime64[ns]

Dask Name: getitem, 8 tasks

They have exactly same column names and types. Now I want to concat them using dask.dataframe.concat :

#df1 & df2 are dask dataframes

print(df1.divisions)

print(df2.divisions)

dfs=dd.concat([df1,df2],axis=0,interleave_partitions=False)

The output:

(Timestamp('2018-03-03 13:04:44.497929'), Timestamp('2018-03-03 13:23:04.759840')) (Timestamp('2018-03-03 07:09:04.184453'), Timestamp('2018-03-03 07:32:46.815356')) ValueError: All inputs have known divisions which cannot be concatenated in order. Specify interleave_partitions=True to ignore order

The two ddf cannnot be concatenating unless specified interleave_partitions=True. But the are no interleaving between the index of two dataframes. Was it caused by the limitation of datetimeindex supporting in dask? Or I need to specified other parameters or convert the index to int or double?

Answer 1

But the are no interleaving between the index of two dataframes

Dask seems to disagree with you here. It seems to think that the range of the index of your two dataframes do overlap a bit. This is ok, you can add the keyword as requested and things should be ok.

dfs=dd.concat([df1,df2],axis=0,interleave_partitions=True)

If you think that you've run into a bug here then I encourage you to reduce it down to a minimal example and post a bug report.

How to concatenate dask Dataframes with datetime index faster?

Question

1 answers

solution1
1 2019-02-20 01:20:37

How to concatenate dask Dataframes with datetime index faster?

Question

1 answers

solution1 1 2019-02-20 01:20:37

solution1
1 2019-02-20 01:20:37