I have the following code in pyspark which works fine.
from pyspark.sql.types import IntegerType, DoubleType
from pyspark.sql.functions import udf, array
prod_cols = udf(lambda arr: float(arr[0])*float(arr[1]), DoubleType())
finalDf = finalDf.withColumn('click_factor', sum_cols(array('rating', 'score')))
Now i tried similar code in scala.
val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf = finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))
Somehow second code doesnt give right answers, always null
or zero
Can you help me get the right scala code. Essentially i just need a code two multiply two columns, considering there may be null values of score
or rating
.
Pass only Not Null
values to UDF
.
Change below code
val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf.withColumn("cl_rate", prod_cols(finalDf("rating"), finalDf("score")))
to
val prod_cols = udf((rating: Double, score: Double) => {rating.toDouble*score.toDouble})
finalDf
.withColumn("rating",$"rating".cast("double")) // Ignore this line if column data type is already double
.withColumn("score",$"score".cast("double")) // Ignore this line if column data type is already double
.withColumn("cl_rate",
when(
$"rating".isNotNull && $"score".isNotNull,
prod_cols($"rating", $"score")
).otherwise(lit(null).cast("double"))
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.