简体   繁体   中英

Combine two spark udf issue

I'm using Spark 1.6 with scala; I have to compute the duration which is the difference between end time and start time. I've tried this:

val msc3 = rddsql.withColumn("Duration",($"EndTime")-($"StartTime"))

I want to add another condition: when the end time and the start time are equal, the duration should be set to 1 instead of 0. How to do it ?

您根本不需要UDFs ,只需使用whenotherwise

rddsql.withColumn("Duration",when($"EndTime" === $"StartTime", 1).otherwise($"EndTime" - $"StartTime"))

You can also do it with 'Case When' and SparkSQL

rddsql.createOrReplaceTempView("rddsql")
spark.sql("select CASE WHEN (EndTime-StartTime = 0) THEN 1 ELSE EndTime-StartTime END as Duration from rddsql") //spark is SparkSession

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM