简体   繁体   English

如何注册我的自定义UDF,以便可以在spark-shell中运行它

[英]How do I register my custom UDF so I can run it in spark-shell

I defined the following custom UDF: 我定义了以下自定义UDF:

def stddev1 (columnName: Column): Column = {
    sqrt(avg(columnName * columnName) - avg(columnName) * avg(columnName))
}

I want to run this function in spark-shell and test it with some example data but I keep running into errors: "Schema for type org.apache.spark.sql.Column is not supported." 我想在spark-shell中运行此功能,并使用一些示例数据对其进行测试,但是我一直遇到错误:“不支持org.apache.spark.sql.Column类型的模式。”

I might have to register it but I'm unsure how to do this 我可能必须注册它,但不确定如何执行此操作

It depends on how you want to use it. 这取决于您要如何使用它。 For example, this works fine: 例如,这可以正常工作:

val df = sc.parallelize(Seq(1,2,3,4)).toDF("myCol")
df.show

>+-----+
>|myCol|
>+-----+
>|    1|
>|    2|
>|    3|
>|    4|
>+-----+

def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))
df.agg(stddev($"myCol")).first

> [1.118033988749895]

However if you want to use it within a Spark SQL statement, you will need something like this: 但是,如果要在Spark SQL语句中使用它,则将需要以下内容:

val squared = (s: Int) => {
  s * s
}
sqlContext.udf.register("square", squared)

%sql select id, square(id) as id_squared from test

Check this out for more info. 查看以获取更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM