[英]How to make an Encoder for scala Iterable, spark dataset
我試圖創建一個從RDD數據集y
Pattern: y: RDD[(MyObj1, scala.Iterable[MyObj2])]
所以我明確創建了編碼器 :
implicit def tuple2[A1, A2](
implicit e1: Encoder[A1],
e2: Encoder[A2]
): Encoder[(A1,A2)] = Encoders.tuple[A1,A2](e1, e2)
//Create Dataset
val z = spark.createDataset(y)(tuple2[MyObj1, Iterable[MyObj2]])
當我編譯此代碼時,我沒有錯誤,但是當我嘗試運行它時,出現此錯誤:
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for scala.Iterable[org.bean.input.MyObj2]
- field (class: "scala.collection.Iterable", name: "_2")
- root class: "scala.Tuple2"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)
我的對象的一些說明(MyObj1和MyObj2)
-MyObj1:
case class MyObj1(
id:String,
type:String
)
-MyObj2:
trait MyObj2 {
val o_state:Option[String]
val n_state:Option[String]
val ch_inf: MyObj1
val state_updated:MyObj3
}
任何幫助請
Spark不提供Iterables
Encoder
,因此,除非您要使用Encoder.kryo
或Encoder.java
,否則它將無法正常工作。
Spark為其提供Encoders
的Iterable
的最接近的子類是Seq
,因此您可能應該在這里使用它。 否則,請參閱如何在數據集中存儲自定義對象?
嘗試將聲明更改為: val y: RDD[(MyObj1, Seq[MyObj2])]
,它將起作用。 我檢查了我的課程:
case class Key(key: String) {}
case class Value(value: Int) {}
對於:
val y: RDD[(Key, Seq[Value])] = sc.parallelize(Map(
Key("A") -> List(Value(1), Value(2)),
Key("B") -> List(Value(3), Value(4), Value(5))
).toSeq)
val z = sparkSession.createDataset(y)
z.show()
我有:
+---+---------------+
| _1| _2|
+---+---------------+
|[A]| [[1], [2]]|
|[B]|[[3], [4], [5]]|
+---+---------------+
如果我更改為“可Iterable
則會遇到例外情況。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.