[英]Type Mismatch Spark Scala
I am trying to create an empty dataframe an using it on a function but I am having the following error all time:我正在尝试创建一个空数据帧并在函数上使用它,但我一直遇到以下错误:
Required: DataFrame
Found: Dataset[DataFrame]
This is how I am doing it:这就是我的做法:
//Create empty DataFrame
val schema = StructType(
StructField("g", StringType, true) ::
StructField("tg", StringType, true) :: Nil)
var df1 = spark.createDataFrame(spark.sparkContext
.emptyRDD[Row], schema)
//or
var df1 = spark.emptyDataFrame
Then I try to use it calling a functions as you can see here:然后我尝试使用它调用一个函数,如您所见:
df1 = kvrdd1_toDF.map(x => function1(x, df1))
And this is the function:这是功能:
def function1(input: org.apache.spark.sql.Row, df: DataFrame): DataFrame = {
val v1 = spark.sparkContext.parallelize(Seq("g","tg"))
var df3 = v1.toDF("g","tg")
if (df.take(1).isEmpty){
df3 = Seq((input.get(2), "nn")).toDF("g", "tg")
} else {
df3 = df3.union(df)
}
df3
}
What am I doing wrong?我究竟做错了什么?
You have a DataFrame
which is an alias for Dataset[Row]
.您有一个DataFrame
,它是Dataset[Row]
的别名。 You map that Row
to a DataFrame
so that's how you end up with a Dataset[DataFrame]
.您将该Row
映射到DataFrame
以便最终得到Dataset[DataFrame]
。 I don't know what you are trying to do but it will never work.我不知道你想做什么,但它永远不会奏效。 The functions (and all its dependencies) you use to map the contents of a Dataset
are serialized and distributed over your spark cluster.用于映射Dataset
内容的函数(及其所有依赖项)已序列化并分布在 Spark 集群上。 You can't use another DataFrame
or a SparkSession
or SparkContext
in such a function.您不能在这样的函数中使用另一个DataFrame
或SparkSession
或SparkContext
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.