[英]How do I register my custom UDF so I can run it in spark-shell
I defined the following custom UDF: 我定义了以下自定义UDF:
def stddev1 (columnName: Column): Column = {
sqrt(avg(columnName * columnName) - avg(columnName) * avg(columnName))
}
I want to run this function in spark-shell and test it with some example data but I keep running into errors: "Schema for type org.apache.spark.sql.Column is not supported." 我想在spark-shell中运行此功能,并使用一些示例数据对其进行测试,但是我一直遇到错误:“不支持org.apache.spark.sql.Column类型的模式。”
I might have to register it but I'm unsure how to do this 我可能必须注册它,但不确定如何执行此操作
It depends on how you want to use it. 这取决于您要如何使用它。 For example, this works fine: 例如,这可以正常工作:
val df = sc.parallelize(Seq(1,2,3,4)).toDF("myCol")
df.show
>+-----+
>|myCol|
>+-----+
>| 1|
>| 2|
>| 3|
>| 4|
>+-----+
def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))
df.agg(stddev($"myCol")).first
> [1.118033988749895]
However if you want to use it within a Spark SQL statement, you will need something like this: 但是,如果要在Spark SQL语句中使用它,则将需要以下内容:
val squared = (s: Int) => {
s * s
}
sqlContext.udf.register("square", squared)
%sql select id, square(id) as id_squared from test
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.