UnsupportedOperationException：找不到 org.apache.spark.sql.Row 的编码器

Question

我正在尝试创建一个数据框。 似乎 spark 无法从 scala.Tuple2 类型创建数据帧。 我该怎么做？ 我是 Scala 和 Spark 的新手。

下面是代码运行的错误跟踪的一部分

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.Row
- field (class: "org.apache.spark.sql.Row", name: "_1")
- root class: "scala.Tuple2"
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:666)
    ..........  
    org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
    at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
    at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:299)
    at SparkMapReduce$.runMapReduce(SparkMapReduce.scala:46)
    at Entrance$.queryLoader(Entrance.scala:64)
    at Entrance$.paramsParser(Entrance.scala:43)
    at Entrance$.main(Entrance.scala:30)
    at Entrance.main(Entrance.scala)

下面是作为整个程序一部分的代码。 问题出现在注释中感叹号上方的那一行


import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.functions.split
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrame

object SparkMapReduce {

  Logger.getLogger("org.spark_project").setLevel(Level.WARN)
  Logger.getLogger("org.apache").setLevel(Level.WARN)
  Logger.getLogger("akka").setLevel(Level.WARN)
  Logger.getLogger("com").setLevel(Level.WARN)

  def runMapReduce(spark: SparkSession, pointPath: String, rectanglePath: String): DataFrame = 
  {
    var pointDf = spark.read.format("csv").option("delimiter",",").option("header","false").load(pointPath);
    pointDf = pointDf.toDF()
    pointDf.createOrReplaceTempView("points")

    pointDf = spark.sql("select ST_Point(cast(points._c0 as Decimal(24,20)),cast(points._c1 as Decimal(24,20))) as point from points")
    pointDf.createOrReplaceTempView("pointsDf")
//    pointDf.show()

    var rectangleDf = spark.read.format("csv").option("delimiter",",").option("header","false").load(rectanglePath);
    rectangleDf = rectangleDf.toDF()
    rectangleDf.createOrReplaceTempView("rectangles")

    rectangleDf = spark.sql("select ST_PolygonFromEnvelope(cast(rectangles._c0 as Decimal(24,20)),cast(rectangles._c1 as Decimal(24,20)), cast(rectangles._c2 as Decimal(24,20)), cast(rectangles._c3 as Decimal(24,20))) as rectangle from rectangles")
    rectangleDf.createOrReplaceTempView("rectanglesDf")
//    rectangleDf.show()

    val joinDf = spark.sql("select rectanglesDf.rectangle as rectangle, pointsDf.point as point from rectanglesDf, pointsDf where ST_Contains(rectanglesDf.rectangle, pointsDf.point)")
    joinDf.createOrReplaceTempView("joinDf")
//    joinDf.show()

    import spark.implicits._
    val joinRdd = joinDf.rdd
    val resmap = joinRdd.map(x=>(x, 1))
    val reduced = resmap.reduceByKey(_+_)
    val final_datablock = reduced.collect()
    val trying : List[Float] = List()
    print(final_datablock)

//      .toDF("rectangles", "count")
//    val dataframe_final1 = spark.createDataFrame(reduced)
    val dataframe_final2 = spark.createDataFrame(reduced).toDF("rectangles", "count")
    // ^ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Line above creates problem
    // You need to complete this part

    var result = spark.emptyDataFrame

    return result // You need to change this part
  }

}

Answer 1

您的第一列reduced的类型为ROW并且在从RDD 转换为DF 时您没有指定它。 数据框必须具有架构。 因此，您需要通过为RDD定义正确的模式来使用以下方法来DataFrame为DataFrame 。

createDataFrame(RDD<Row> rowRDD, StructType schema)

例如：

val schema = new StructType()
  .add(Array(
    StructField("._1a",IntegerType),
    StructField("._1b", ArrayType(StringType))
  ))
  .add(StructField("count", IntegerType, true))

UnsupportedOperationException：找不到 org.apache.spark.sql.Row 的编码器

问题描述

1 个解决方案

解决方案1
0 2021-10-23 11:59:01

UnsupportedOperationException：找不到 org.apache.spark.sql.Row 的编码器

问题描述

1 个解决方案

解决方案1 0 2021-10-23 11:59:01

解决方案1
0 2021-10-23 11:59:01