不支持Spark 2.1.0 UDF Schema类型

Question

I am using a data type called a Point(x: Double, y: Double). 我正在使用一个名为Point的数据类型（x：Double，y：Double）。 I am trying to using columns _c1 and _c2 as input to Point(), and then create a new column of Point values as follows 我试图使用列_c1和_c2作为Point（）的输入，然后创建一个新的Point值列，如下所示

val toPoint = udf{(x: Double, y: Double) => Point(x,y)}

Then I call the function: 然后我调用函数：

val point = data.withColumn("Point", toPoint(watned("c1"),wanted("c2")))

However, when I declare the udf I get the following error: 但是，当我声明udf时，我收到以下错误：

java.lang.UnsupportedOperationException: Schema for type com.vividsolutions.jts.geom.Point is not supported
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:729)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:728)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.immutable.List.map(List.scala:285)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:728)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
      at org.apache.spark.sql.functions$.udf(functions.scala:3084)
      ... 48 elided

I have properly imported this data type, and used it many times before. 我已正确导入此数据类型，并且之前使用过很多次。 Now that I try to include it in the Schema of my udf it doesn't recognize it. 现在我尝试将它包含在我的udf的Schema中，它无法识别它。 What is the method to include types other than the standard Int, String, Array, etc... 包含除标准Int，String，Array等之外的类型的方法是什么...

I am using Spark 2.1.0 on Amazon EMR. 我在Amazon EMR上使用Spark 2.1.0。

Here some related questions I've referenced: 这里引用了一些相关的问题：

How to define schema for custom type in Spark SQL? 如何在Spark SQL中定义自定义类型的模式？

Spark UDF error - Schema for type Any is not supported Spark UDF错误 - 不支持类型为Any的架构

Answer 1

You should define Point as a case class 您应该将Point定义为案例类

case class Point(x: Double, y: Double)

or if you wish 或者如果你愿意的话

case class MyPoint(x:Double,y:Double) extends com.vividsolutions.jts.geom.Point(x,y)

This way the schema is inferred automatically by Spark 这样，Spark就会自动推断出架构

不支持Spark 2.1.0 UDF Schema类型

问题描述

1 个解决方案

解决方案1
0 2017-04-27 06:12:49

不支持Spark 2.1.0 UDF Schema类型

问题描述

1 个解决方案

解决方案1 0 2017-04-27 06:12:49

解决方案1
0 2017-04-27 06:12:49