如何注册我的自定义UDF，以便可以在spark-shell中运行它

Question

I defined the following custom UDF: 我定义了以下自定义UDF：

def stddev1 (columnName: Column): Column = {
    sqrt(avg(columnName * columnName) - avg(columnName) * avg(columnName))
}

I want to run this function in spark-shell and test it with some example data but I keep running into errors: "Schema for type org.apache.spark.sql.Column is not supported." 我想在spark-shell中运行此功能，并使用一些示例数据对其进行测试，但是我一直遇到错误：“不支持org.apache.spark.sql.Column类型的模式。”

I might have to register it but I'm unsure how to do this 我可能必须注册它，但不确定如何执行此操作

Answer 1

It depends on how you want to use it. 这取决于您要如何使用它。 For example, this works fine: 例如，这可以正常工作：

val df = sc.parallelize(Seq(1,2,3,4)).toDF("myCol")
df.show

>+-----+
>|myCol|
>+-----+
>|    1|
>|    2|
>|    3|
>|    4|
>+-----+

def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))
df.agg(stddev($"myCol")).first

> [1.118033988749895]

However if you want to use it within a Spark SQL statement, you will need something like this: 但是，如果要在Spark SQL语句中使用它，则将需要以下内容：

val squared = (s: Int) => {
  s * s
}
sqlContext.udf.register("square", squared)

%sql select id, square(id) as id_squared from test

Check this out for more info. 查看此以获取更多信息。

如何注册我的自定义UDF，以便可以在spark-shell中运行它

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-06-22 18:53:51

如何注册我的自定义UDF，以便可以在spark-shell中运行它

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-06-22 18:53:51

解决方案1
1 已采纳 2016-06-22 18:53:51