简体   繁体   English

如何在斯卡拉星火使用UDF WHERE子句中

[英]How to use UDF in where clause in Scala Spark

I'm trying to check if 2 Double columns are equal in a Dataframe to a certain degree of precision, so 49.999999 should equal 50. Is it possible to create a UDF and use it in a where clause? 我正在尝试检查Dataframe中的2个Double列是否在某种程度上相等,所以49.999999应该等于50。是否可以创建UDF并在where子句中使用它? I am using Spark 2.0 in Scala. 我在Scala中使用Spark 2.0。

You can use udf but there is no need for that: 您可以使用udf但没有必要的:

import org.apache.spark.sql.functions._

val precision: Double = ???

df.where(abs($"col1" - $"col2") < precision)

udf call would work the same way, but be less efficient udf调用会的工作方式相同,但效率较低

df.where(yourUdf($"col1", $"col2"))

assuming ctx is SQL context 假设ctx是SQL上下文

ctx.udf.register("areEqual", (x: Double, y: Double, precision : Double) => abs(x-y)< prescision

and then 接着

df.where(areEqual($"col1",$"col2",precision))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM