简体   繁体   English

类型不匹配 Spark Scala

[英]Type Mismatch Spark Scala

I am trying to create an empty dataframe an using it on a function but I am having the following error all time:我正在尝试创建一个空数据帧并在函数上使用它,但我一直遇到以下错误:

Required: DataFrame
Found: Dataset[DataFrame]

This is how I am doing it:这就是我的做法:

//Create empty DataFrame
val schema = StructType(
    StructField("g", StringType, true) ::
    StructField("tg", StringType, true) :: Nil)

var df1 = spark.createDataFrame(spark.sparkContext
      .emptyRDD[Row], schema)
//or
var df1 = spark.emptyDataFrame

Then I try to use it calling a functions as you can see here:然后我尝试使用它调用一个函数,如您所见:

  df1 = kvrdd1_toDF.map(x => function1(x, df1))

And this is the function:这是功能:

  def function1(input: org.apache.spark.sql.Row, df: DataFrame): DataFrame = {
    val v1 = spark.sparkContext.parallelize(Seq("g","tg"))
    var df3 = v1.toDF("g","tg")
    if (df.take(1).isEmpty){
      df3 = Seq((input.get(2), "nn")).toDF("g", "tg")
    } else {
      df3 = df3.union(df)
    }
    df3
  }

What am I doing wrong?我究竟做错了什么?

You have a DataFrame which is an alias for Dataset[Row] .您有一个DataFrame ,它是Dataset[Row]的别名。 You map that Row to a DataFrame so that's how you end up with a Dataset[DataFrame] .您将该Row映射到DataFrame以便最终得到Dataset[DataFrame] I don't know what you are trying to do but it will never work.我不知道你想做什么,但它永远不会奏效。 The functions (and all its dependencies) you use to map the contents of a Dataset are serialized and distributed over your spark cluster.用于映射Dataset内容的函数(及其所有依赖项)已序列化并分布在 Spark 集群上。 You can't use another DataFrame or a SparkSession or SparkContext in such a function.您不能在这样的函数中使用另一个DataFrameSparkSessionSparkContext

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM