简体   繁体   English

Apache Spark 2.1:java.lang.UnsupportedOperationException:找不到scala.collection.immutable.Set [String]的编码器

[英]Apache Spark 2.1 : java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[String]

I am using Spark 2.1.1 with Scala 2.11.6. 我在Scala 2.11.6中使用Spark 2.1.1。 I am getting the following error. 我收到以下错误。 I am not using any case classes. 我没有使用任何案例类。

java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[String]
 field (class: "scala.collection.immutable.Set", name: "_2")
 field (class: "scala.Tuple2", name: "_2")
 root class: "scala.Tuple2"

The following portion of code is where the stacktrace points at. 以下代码部分是stacktrace指向的位置。

val tweetArrayRDD = nameDF.select("namedEnts", "text", "storylines")
    .flatMap {
    case Row(namedEnts: Traversable[(String, String)], text: String, storylines: Traversable[String]) =>
      Option(namedEnts) match {
        case Some(x: Traversable[(String, String)]) =>
          //println("In flatMap:" + x + " ~~&~~ " + text + " ~~&~~ " + storylines)
          namedEnts.map((_, (text, storylines.toSet)))
        case _ => //println("In flatMap: blahhhh")
          Traversable()
      }
    case _ => //println("In flatMap: fooooo")
      Traversable()
  }
  .rdd.aggregateByKey((Set[String](), Set[String]()))((a, b) => (a._1 + b._1, a._2 ++ b._2), (a, b) => (a._1 ++ b._1, a._2 ++ b._2))
  .map { (s: ((String, String), (Set[String], Set[String]))) => {
    //println("In map: " + s)
    (s._1, (s._2._1.toSeq, s._2._2.toSeq))
  }}

The problem here is that Spark does not provide an encoder for Set out-of-the-box (it does provide encoders for "primitives", Seqs, Arrays, and Products of other supported types). 这里的问题是,星火没有为提供编码器Set外的开箱(它的“原型”提供编码器,Seqs,数组和其他支持的类型的产品)。

You can either try using this excellent answer to create your own encoder for Set[String] (more accurately, an encoder for the type you're using, Traversable[((String, String), (String, Set[String]))] , which contains a Set[String] ), OR you can work-around this issue by using a Seq instead of a Set : 您可以尝试使用这个出色的答案Set[String]创建自己的编码器(更准确地说,是针对您使用的类型的编码器, Traversable[((String, String), (String, Set[String]))] ,其中包含Set[String] ), 或者您可以使用Seq代替Set来解决此问题:

// ...
case Some(x: Traversable[(String, String)]) =>
  //println("In flatMap:" + x + " ~~&~~ " + text + " ~~&~~ " + storylines)
  namedEnts.map((_, (text, storylines.toSeq.distinct)))
// ...

(I'm using distinct to immitate the Set behavior; Can also try .toSet.toSeq ) (我正在使用distinct模仿Set行为;也可以尝试.toSet.toSeq

UPDATE : per your comment re Spark 1.6.2 - the difference is that in 1.6.2, Dataset.flatMap returns an RDD and not a Dataset , therefore requires no encoding of the results returned from the function you supply; 更新 :根据您的评论,请Dataset.flatMap Spark Dataset.flatMap区别在于1.6.2中, Dataset.flatMap返回RDD而不是Dataset ,因此不需要对您提供的函数返回的结果进行编码; So, this indeed brings up another good workaround - you can easily simulate this behavior by explicitly switching to work with the RDD before the flatMap operation: 因此,这确实带来了另一个很好的解决方法-您可以通过在flatMap操作之前显式切换为使用RDD来轻松模拟此行为:

nameDF.select("namedEnts", "text", "storylines")
  .rdd
  .flatMap { /*...*/ } // use your function as-is, it can return Set[String]
  .aggregateByKey( /*...*/ )
  .map( /*...*/ )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Spark 2.0:java.lang.UnsupportedOperationException:找不到java.time.LocalDate的编码器 - Apache Spark 2.0: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate Spark java.lang.UnsupportedOperationException:空集合 - Spark java.lang.UnsupportedOperationException: empty collection Scala Spark udf java.lang.UnsupportedOperationException - Scala Spark udf java.lang.UnsupportedOperationException Spark:java.lang.UnsupportedOperationException:找不到java.time.LocalDate的编码器 - Spark: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate 从java使用scala.collection.immutable.Set的示例 - example of using scala.collection.immutable.Set from java Scala Spark-java.lang.UnsupportedOperationException:空 - Scala Spark - java.lang.UnsupportedOperationException: empty.init (1至4).toSet和(1至4).to [scala.collection.immutable.Set]之间的Scala差异? - Scala difference between (1 to 4).toSet and (1 to 4).to[scala.collection.immutable.Set]? Scala 枚举 - Java.lang.UnsupportedOperationException - Scala enum - Java.lang.UnsupportedOperationException “ java.lang.UnsupportedOperationException:空集合” - “java.lang.UnsupportedOperationException: empty collection” Spark Scala UDF:java.lang.UnsupportedOperationException:不支持任何类型的架构 - Spark Scala UDF : java.lang.UnsupportedOperationException: Schema for type Any is not supported
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM