[英]UnsupportedOperationException: No Encoder found for org.apache.spark.sql.Row
我正在尝试创建一个数据框。 似乎 spark 无法从 scala.Tuple2 类型创建数据帧。 我该怎么做? 我是 Scala 和 Spark 的新手。
下面是代码运行的错误跟踪的一部分
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.Row
- field (class: "org.apache.spark.sql.Row", name: "_1")
- root class: "scala.Tuple2"
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:666)
..........
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:299)
at SparkMapReduce$.runMapReduce(SparkMapReduce.scala:46)
at Entrance$.queryLoader(Entrance.scala:64)
at Entrance$.paramsParser(Entrance.scala:43)
at Entrance$.main(Entrance.scala:30)
at Entrance.main(Entrance.scala)
下面是作为整个程序一部分的代码。 问题出现在注释中感叹号上方的那一行
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.functions.split
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrame
object SparkMapReduce {
Logger.getLogger("org.spark_project").setLevel(Level.WARN)
Logger.getLogger("org.apache").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
Logger.getLogger("com").setLevel(Level.WARN)
def runMapReduce(spark: SparkSession, pointPath: String, rectanglePath: String): DataFrame =
{
var pointDf = spark.read.format("csv").option("delimiter",",").option("header","false").load(pointPath);
pointDf = pointDf.toDF()
pointDf.createOrReplaceTempView("points")
pointDf = spark.sql("select ST_Point(cast(points._c0 as Decimal(24,20)),cast(points._c1 as Decimal(24,20))) as point from points")
pointDf.createOrReplaceTempView("pointsDf")
// pointDf.show()
var rectangleDf = spark.read.format("csv").option("delimiter",",").option("header","false").load(rectanglePath);
rectangleDf = rectangleDf.toDF()
rectangleDf.createOrReplaceTempView("rectangles")
rectangleDf = spark.sql("select ST_PolygonFromEnvelope(cast(rectangles._c0 as Decimal(24,20)),cast(rectangles._c1 as Decimal(24,20)), cast(rectangles._c2 as Decimal(24,20)), cast(rectangles._c3 as Decimal(24,20))) as rectangle from rectangles")
rectangleDf.createOrReplaceTempView("rectanglesDf")
// rectangleDf.show()
val joinDf = spark.sql("select rectanglesDf.rectangle as rectangle, pointsDf.point as point from rectanglesDf, pointsDf where ST_Contains(rectanglesDf.rectangle, pointsDf.point)")
joinDf.createOrReplaceTempView("joinDf")
// joinDf.show()
import spark.implicits._
val joinRdd = joinDf.rdd
val resmap = joinRdd.map(x=>(x, 1))
val reduced = resmap.reduceByKey(_+_)
val final_datablock = reduced.collect()
val trying : List[Float] = List()
print(final_datablock)
// .toDF("rectangles", "count")
// val dataframe_final1 = spark.createDataFrame(reduced)
val dataframe_final2 = spark.createDataFrame(reduced).toDF("rectangles", "count")
// ^ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Line above creates problem
// You need to complete this part
var result = spark.emptyDataFrame
return result // You need to change this part
}
}
您的第一列reduced
的类型为ROW
并且在从RDD 转换为DF 时您没有指定它。 数据框必须具有架构。 因此,您需要通过为RDD
定义正确的模式来使用以下方法来DataFrame
为DataFrame
。
createDataFrame(RDD<Row> rowRDD, StructType schema)
例如:
val schema = new StructType()
.add(Array(
StructField("._1a",IntegerType),
StructField("._1b", ArrayType(StringType))
))
.add(StructField("count", IntegerType, true))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.