Convert udf over multiple columns in scala spark

Question

I have the following code in pyspark which works fine.

from pyspark.sql.types import IntegerType, DoubleType
from pyspark.sql.functions import udf, array
prod_cols = udf(lambda arr: float(arr[0])*float(arr[1]), DoubleType())
finalDf = finalDf.withColumn('click_factor', sum_cols(array('rating', 'score')))

Now i tried similar code in scala.

 val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf = finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))

Somehow second code doesnt give right answers, always null or zero

Can you help me get the right scala code. Essentially i just need a code two multiply two columns, considering there may be null values of score or rating .

Answer 1

Pass only Not Null values to UDF .

Change below code

val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))

to

val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf
.withColumn("rating",$"rating".cast("double")) // Ignore this line if column data type is already double 
.withColumn("score",$"score".cast("double")) // Ignore this line if column data type is already double 
.withColumn("cl_rate", 
             when(
                  $"rating".isNotNull && $"score".isNotNull, 
                  prod_cols($"rating", $"score")
             ).otherwise(lit(null).cast("double"))
)

Convert udf over multiple columns in scala spark

Question

1 answers

solution1
0 2020-07-02 01:33:11

Convert udf over multiple columns in scala spark

Question

1 answers

solution1 0 2020-07-02 01:33:11

solution1
0 2020-07-02 01:33:11