简体   繁体   中英

convert python function to pyspark lambda function

I have a python function like below

def func(a, b, c):
    if c != 0:
        return b/c * a
    else:
        return a

I wanted to create a lambda function for this I have tried creating a lambda function like below

 func = lambda x,y,z : y/z * x if z != 0 else z

but getting error

TypeError: unsupported operand type(s) for /: 'str' and 'str'

This is how I am calling

df= df.withColumn('new_col' ,func('x', 'y', 'z'))

Even I tried casting columns to float but still getting an issue.

Note: I do not want it to be created a UDF because I am working with a huge dataset and UDFs are taking a lot of time so looking for a Lambda function

You are calling your lambda function wrong.

You pass it 3 string rather then numeric variables, you should pass the values of x / y / z rather then calling the the strings. You should probably do the following:

df= df.withColumn('new_col' ,func($'x', $'y', $'z'))
# Or
df= df.withColumn('new_col' ,func(df['x'], df['y'], df['z']))

But I can't be sure without you sharing the data-frame structure.

Pay closer attention to the error messages. It clearly states that the variable you passed to the function are strings and you can't perform an arithmetic operation on strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM