Dask: applying custom function to DataFrame gets error

Question

I'd like to speed up my DataFrame manipulations and have decided to use for this aim the dask library - but cannot use it with success. I have made a test example to show my problems:

import numpy as np
import pandas as pd
import dask.dataframe as dd
from dask.multiprocessing import get

def testfunc(good):
  return good*good

df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)

df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(*row)), axis=1)).compute(get=get)

But running this code I receive an error: TypeError: testfunc() takes 1 positional argument but 3 were given. Could you explain what is wrong in my code...

Answer 1

This will work with a minor change. You're currently unpacking the row object by using the asterisk. You probably want to directly pass the row, as is.

import numpy as np
import pandas as pd
import dask.dataframe as dd

def testfunc(good):
    return good*good

df = pd.DataFrame({'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,8,9]})
ddata = dd.from_pandas(df, npartitions=2)

df1 = ddata.map_partitions(lambda df: df.apply((lambda row: testfunc(row)), axis=1)).compute()
print(df1)
   a   b   c
0  1  16  49
1  4  25  64
2  9  36  81

For more information, you might want to check out the expression Python docs

Dask: applying custom function to DataFrame gets error

Question

1 answers

solution1
1 2020-08-25 02:13:24

Dask: applying custom function to DataFrame gets error

Question

1 answers

solution1 1 2020-08-25 02:13:24

solution1
1 2020-08-25 02:13:24