I have a similar problem as this when concatenating two timestamp-indexed dask Dataframe vertically.
I have two dask dataframes df1,df2:
df1.index:
Dask Index Structure:
npartitions=1
2018-03-03 13:04:44.497929 datetime64[ns]
2018-03-03 13:23:04.759840 ...
Name: time, dtype: datetime64[ns]
Dask Name: getitem, 8 tasks
df2.index:
Dask Index Structure:
npartitions=1
2018-03-03 07:09:04.184453 datetime64[ns]
2018-03-03 07:32:46.815356 ...
Name: time, dtype: datetime64[ns]
Dask Name: getitem, 8 tasks
They have exactly same column names and types. Now I want to concat them using dask.dataframe.concat :
#df1 & df2 are dask dataframes
print(df1.divisions)
print(df2.divisions)
dfs=dd.concat([df1,df2],axis=0,interleave_partitions=False)
The output:
(Timestamp('2018-03-03 13:04:44.497929'), Timestamp('2018-03-03 13:23:04.759840')) (Timestamp('2018-03-03 07:09:04.184453'), Timestamp('2018-03-03 07:32:46.815356')) ValueError: All inputs have known divisions which cannot be concatenated in order. Specify interleave_partitions=True to ignore order
The two ddf cannnot be concatenating unless specified interleave_partitions=True. But the are no interleaving between the index of two dataframes. Was it caused by the limitation of datetimeindex supporting in dask? Or I need to specified other parameters or convert the index to int or double?
But the are no interleaving between the index of two dataframes
Dask seems to disagree with you here. It seems to think that the range of the index of your two dataframes do overlap a bit. This is ok, you can add the keyword as requested and things should be ok.
dfs=dd.concat([df1,df2],axis=0,interleave_partitions=True)
If you think that you've run into a bug here then I encourage you to reduce it down to a minimal example and post a bug report.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.