简体   繁体   English

scala spark UDF ClassCastException: WrappedArray$ofRef 不能转换为 [Lscala.Tuple2

[英]scala spark UDF ClassCastException : WrappedArray$ofRef cannot be cast to [Lscala.Tuple2

So I perform the necessary imports etc所以我执行必要的导入等

import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types._
import spark.implicits._

then define some latlong points然后定义一些经纬度点

val london = (1.0, 1.0)
val suburbia = (2.0, 2.0)
val southampton = (3.0, 3.0)  
val york = (4.0, 4.0)  

I then create a spark Dataframe like this and check that it works:然后我像这样创建一个火花 Dataframe 并检查它是否有效:

val exampleDF = Seq((List(london,suburbia),List(southampton,york)),
    (List(york,london),List(southampton,suburbia))).toDF("AR1","AR2")
exampleDF.show()

the dataframe consists of the following types dataframe 由以下类型组成

DataFrame = [AR1: array<struct<_1:double,_2:double>>, AR2: array<struct<_1:double,_2:double>>]

I create a function to create a combination of points我创建了一个 function 来创建点组合

// function to do what I want
val latlongexplode =  (x: Array[(Double,Double)], y: Array[(Double,Double)]) => {
 for (a <- x; b <-y) yield (a,b)
}

I check that the function works我检查 function 是否有效

latlongexplode(Array(london,york),Array(suburbia,southampton))

and it does.它确实如此。 However after i create a UDF out of this function但是,在我从此 function 创建 UDF 之后

// declare function into a Spark UDF
val latlongexplodeUDF = udf (latlongexplode) 

when i try to use it in the spark dataframe I have created above like this:当我尝试在火花 dataframe 中使用它时,我在上面创建了这样的:

exampleDF.withColumn("latlongexplode", latlongexplodeUDF($"AR1",$"AR2")).show(false)

I get a really long stacktrace which basically boils down to:我得到了一个很长的堆栈跟踪,基本上可以归结为:

java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.Tuple2; java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef 不能转换为[Lscala.Tuple2;
org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$f$3(ScalaUDF.scala:121) org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1063) org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:151) org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:50) org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:32) scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273) org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$f$3(ScalaUDF.scala:121) org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1063) org. apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:151) org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:50) org.apache.spark.sql. catalyst.expressions.InterpretedProjection.apply(Projection.scala:32) scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.ZBAAD2138E66B278C1370)578C

How can I get this udf to work in Scala Spark?如何让这个 udf 在 Scala Spark 中工作? (im using 2.4 at the moment if this helps) (如果这有帮助,我目前正在使用 2.4)

EDIT: it could be that the way I construct my example df has an issue.编辑:可能是我构建示例 df 的方式存在问题。 But what I have as the actual data is an array (of unknown size) of lat/long tuples on each column.但是我所拥有的实际数据是每列上的纬度/经度元组的数组(大小未知)。

When working with struct types in UDF, they are represented as Row objects, and array columns are represented as Seq.在 UDF 中使用结构类型时,它们表示为 Row 对象,而数组列表示为 Seq。 Also, you need to return a struct in the form of a Row, and you need to define a schema to return a struct.此外,您需要以 Row 的形式返回一个结构,并且您需要定义一个模式来返回一个结构。

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val london = (1.0, 1.0)
val suburbia = (2.0, 2.0)
val southampton = (3.0, 3.0)  
val york = (4.0, 4.0)
val exampleDF = Seq((List(london,suburbia),List(southampton,york)),
    (List(york,london),List(southampton,suburbia))).toDF("AR1","AR2")
exampleDF.show(false)
+------------------------+------------------------+
|AR1                     |AR2                     |
+------------------------+------------------------+
|[[1.0, 1.0], [2.0, 2.0]]|[[3.0, 3.0], [4.0, 4.0]]|
|[[4.0, 4.0], [1.0, 1.0]]|[[3.0, 3.0], [2.0, 2.0]]|
+------------------------+------------------------+
val latlongexplode = (x: Seq[Row], y: Seq[Row]) => {
    for (a <- x; b <- y) yield Row(a, b)
}

val udf_schema = ArrayType(
    StructType(Seq(
        StructField(
            "city1",
            StructType(Seq(
                StructField("lat", FloatType),
                StructField("long", FloatType)
            ))
        ),
        StructField(
            "city2",
            StructType(Seq(
                StructField("lat", FloatType),
                StructField("long", FloatType)
            ))
        )
    ))
)

// include this line if you see errors like 
// "You're using untyped Scala UDF, which does not have the input type information."
// spark.sql("set spark.sql.legacy.allowUntypedScalaUDF = true")

val latlongexplodeUDF = udf(latlongexplode, udf_schema)
result = exampleDF.withColumn("latlongexplode", latlongexplodeUDF($"AR1",$"AR2"))
result.show(false)
+------------------------+------------------------+--------------------------------------------------------------------------------------------------------+
|AR1                     |AR2                     |latlongexplode                                                                                          |
+------------------------+------------------------+--------------------------------------------------------------------------------------------------------+
|[[1.0, 1.0], [2.0, 2.0]]|[[3.0, 3.0], [4.0, 4.0]]|[[[1.0, 1.0], [3.0, 3.0]], [[1.0, 1.0], [4.0, 4.0]], [[2.0, 2.0], [3.0, 3.0]], [[2.0, 2.0], [4.0, 4.0]]]|
|[[4.0, 4.0], [1.0, 1.0]]|[[3.0, 3.0], [2.0, 2.0]]|[[[4.0, 4.0], [3.0, 3.0]], [[4.0, 4.0], [2.0, 2.0]], [[1.0, 1.0], [3.0, 3.0]], [[1.0, 1.0], [2.0, 2.0]]]|
+------------------------+------------------------+--------------------------------------------------------------------------------------------------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 “主要” java.lang.ClassCastException:[Lscala.Tuple2; 无法在Spark MLlib LDA中强制转换为scala.Tuple2 - “main” java.lang.ClassCastException: [Lscala.Tuple2; cannot be cast to scala.Tuple2 in Spark MLlib LDA Spark java.lang.ClassCastException:scala.collection.mutable.WrappedArray$ofRef 无法转换为 java.util.ArrayList - Spark java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.util.ArrayList 在scala中匹配Lscala.Tuple2 - Matching Lscala.Tuple2 in scala 编译错误? java.lang.ClassCastException:scala.collection.mutable.WrappedArray $ ofRef无法强制转换为java.lang.Integer - Compiler bug? java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.lang.Integer Scala + Spark - 如何从“scala.collection.mutable.WrappedArray$ofRef”转换为我们的自定义 object? - Scala + Spark - How to cast from 'scala.collection.mutable.WrappedArray$ofRef' to our custom object? cas 异常:scala.collection.mutable.WrappedArray$ofRef 不能转换为 [D - cas Exception:scala.collection.mutable.WrappedArray$ofRef cannot be cast to [D Spark java.lang.ClassCastException: scala.collection.mutable.WrappedArray 无法转换为 scala.collection.Seq.immutable - Spark java.lang.ClassCastException: scala.collection.mutable.WrappedArray cannot be cast to scala.collection.immutable.Seq 如何在Spark(scala)中将WrappedArray [WrappedArray [Float]]投射到Array [Array [Float]] - How to cast a WrappedArray[WrappedArray[Float]] to Array[Array[Float]] in spark (scala) 如何在Spark(Scala)中将WrappedArray [WrappedArray [(String,String)]]转换为Array [String] - How cast a WrappedArray[WrappedArray[(String, String)]] to Array[String] in Spark (Scala) 如何在 Spark Scala 中将 WrappedArray 转换为 List? - How can I cast WrappedArray to List in Spark Scala?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM