[英]Apache Spark 2.1 : java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[String]
I am using Spark 2.1.1 with Scala 2.11.6. 我在Scala 2.11.6中使用Spark 2.1.1。 I am getting the following error.
我收到以下错误。 I am not using any case classes.
我没有使用任何案例类。
java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[String]
field (class: "scala.collection.immutable.Set", name: "_2")
field (class: "scala.Tuple2", name: "_2")
root class: "scala.Tuple2"
The following portion of code is where the stacktrace points at. 以下代码部分是stacktrace指向的位置。
val tweetArrayRDD = nameDF.select("namedEnts", "text", "storylines")
.flatMap {
case Row(namedEnts: Traversable[(String, String)], text: String, storylines: Traversable[String]) =>
Option(namedEnts) match {
case Some(x: Traversable[(String, String)]) =>
//println("In flatMap:" + x + " ~~&~~ " + text + " ~~&~~ " + storylines)
namedEnts.map((_, (text, storylines.toSet)))
case _ => //println("In flatMap: blahhhh")
Traversable()
}
case _ => //println("In flatMap: fooooo")
Traversable()
}
.rdd.aggregateByKey((Set[String](), Set[String]()))((a, b) => (a._1 + b._1, a._2 ++ b._2), (a, b) => (a._1 ++ b._1, a._2 ++ b._2))
.map { (s: ((String, String), (Set[String], Set[String]))) => {
//println("In map: " + s)
(s._1, (s._2._1.toSeq, s._2._2.toSeq))
}}
The problem here is that Spark does not provide an encoder for Set
out-of-the-box (it does provide encoders for "primitives", Seqs, Arrays, and Products of other supported types). 这里的问题是,星火没有为提供编码器
Set
外的开箱(它的“原型”提供编码器,Seqs,数组和其他支持的类型的产品)。
You can either try using this excellent answer to create your own encoder for Set[String]
(more accurately, an encoder for the type you're using, Traversable[((String, String), (String, Set[String]))]
, which contains a Set[String]
), OR you can work-around this issue by using a Seq
instead of a Set
: 您可以尝试使用这个出色的答案为
Set[String]
创建自己的编码器(更准确地说,是针对您使用的类型的编码器, Traversable[((String, String), (String, Set[String]))]
,其中包含Set[String]
), 或者您可以使用Seq
代替Set
来解决此问题:
// ...
case Some(x: Traversable[(String, String)]) =>
//println("In flatMap:" + x + " ~~&~~ " + text + " ~~&~~ " + storylines)
namedEnts.map((_, (text, storylines.toSeq.distinct)))
// ...
(I'm using distinct
to immitate the Set
behavior; Can also try .toSet.toSeq
) (我正在使用
distinct
模仿Set
行为;也可以尝试.toSet.toSeq
)
UPDATE : per your comment re Spark 1.6.2 - the difference is that in 1.6.2, Dataset.flatMap
returns an RDD
and not a Dataset
, therefore requires no encoding of the results returned from the function you supply; 更新 :根据您的评论,请
Dataset.flatMap
Spark Dataset.flatMap
区别在于1.6.2中, Dataset.flatMap
返回RDD
而不是Dataset
,因此不需要对您提供的函数返回的结果进行编码; So, this indeed brings up another good workaround - you can easily simulate this behavior by explicitly switching to work with the RDD before the flatMap
operation: 因此,这确实带来了另一个很好的解决方法-您可以通过在
flatMap
操作之前显式切换为使用RDD来轻松模拟此行为:
nameDF.select("namedEnts", "text", "storylines")
.rdd
.flatMap { /*...*/ } // use your function as-is, it can return Set[String]
.aggregateByKey( /*...*/ )
.map( /*...*/ )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.