简体   繁体   中英

Convert udf over multiple columns in scala spark

I have the following code in pyspark which works fine.

from pyspark.sql.types import IntegerType, DoubleType
from pyspark.sql.functions import udf, array
prod_cols = udf(lambda arr: float(arr[0])*float(arr[1]), DoubleType())
finalDf = finalDf.withColumn('click_factor', sum_cols(array('rating', 'score')))

Now i tried similar code in scala.

 val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf = finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))

Somehow second code doesnt give right answers, always null or zero

Can you help me get the right scala code. Essentially i just need a code two multiply two columns, considering there may be null values of score or rating .

Pass only Not Null values to UDF .

Change below code

val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))

to

val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf
.withColumn("rating",$"rating".cast("double")) // Ignore this line if column data type is already double 
.withColumn("score",$"score".cast("double")) // Ignore this line if column data type is already double 
.withColumn("cl_rate", 
             when(
                  $"rating".isNotNull && $"score".isNotNull, 
                  prod_cols($"rating", $"score")
             ).otherwise(lit(null).cast("double"))
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM