简体   繁体   中英

How to concat on axis=1 with Dask delayed? (simplified)

Pandas and Dask produce different results (because I'm doing something wrong in Dask I think). I want to get the Dask result to match the Pandas one here.

This toy program should run as-is to demonstrate:

import dask
import dask.dataframe as ddf
import pandas as pd


# This creates a toy pd.DataFrame
def get(ii):
    x = 2 * ii
    return pd.DataFrame.from_dict({'a':[x + 1, x + 2]})


if __name__ == '__main__':

    print('Using Pandas')
    df1 = get(0)
    df2 = get(1)
    pandas_df = pd.concat([df1, df2], axis=1)
    print(pandas_df)

    print('\n\nUsing Dask')
    output = []
    for ii in range(2):
        output.append(dask.delayed(get)(ii))
    temp = ddf.from_delayed(output)
    temp2 = ddf.concat([temp], axis=1)
    dask_df = temp2.compute()
    print(dask_df)

And the output:

Using Pandas (this is what I want)
   a  a
0  1  3
1  2  4


Using Dask (oops what happened here?)
   a
0  1
1  2
0  3
1  4
import dask.dataframe as ddf
from dask import delayed
import pandas as pd

def get(ii):
    x = 2 * ii
    return pd.DataFrame.from_dict({'a':[x + 1, x + 2]})

@delayed()
def make_daskdf(*num):
    df_list = []
    for i in num:
        df_list.append(get(i))
    df = pd.concat(df_list, axis=1)
    return df

dask_df = make_daskdf(0, 1, 2, 3).compute()

Output:

dask_df
Out[38]: 
   a  a  a  a
0  1  3  5  7
1  2  4  6  8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM