Pandas and Dask produce different results (because I'm doing something wrong in Dask I think). I want to get the Dask result to match the Pandas one here.
This toy program should run as-is to demonstrate:
import dask
import dask.dataframe as ddf
import pandas as pd
# This creates a toy pd.DataFrame
def get(ii):
x = 2 * ii
return pd.DataFrame.from_dict({'a':[x + 1, x + 2]})
if __name__ == '__main__':
print('Using Pandas')
df1 = get(0)
df2 = get(1)
pandas_df = pd.concat([df1, df2], axis=1)
print(pandas_df)
print('\n\nUsing Dask')
output = []
for ii in range(2):
output.append(dask.delayed(get)(ii))
temp = ddf.from_delayed(output)
temp2 = ddf.concat([temp], axis=1)
dask_df = temp2.compute()
print(dask_df)
And the output:
Using Pandas (this is what I want)
a a
0 1 3
1 2 4
Using Dask (oops what happened here?)
a
0 1
1 2
0 3
1 4
import dask.dataframe as ddf
from dask import delayed
import pandas as pd
def get(ii):
x = 2 * ii
return pd.DataFrame.from_dict({'a':[x + 1, x + 2]})
@delayed()
def make_daskdf(*num):
df_list = []
for i in num:
df_list.append(get(i))
df = pd.concat(df_list, axis=1)
return df
dask_df = make_daskdf(0, 1, 2, 3).compute()
Output:
dask_df
Out[38]:
a a a a
0 1 3 5 7
1 2 4 6 8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.