spark.createDataFrame（）无法与Seq RDD一起使用

Question

CreateDataFrame takes 2 arguments , an rdd and schema. CreateDataFrame带有2个参数，rdd和schema。

my schema is like this 我的架构是这样的

val schemas= StructType( Seq( StructField("number",IntegerType,false), StructField("notation", StringType,false) ) )

in one case i am able to create dataframe from RDD like below: 在一种情况下，我可以从RDD创建数据框，如下所示：

`val data1=Seq(Row(1,"one"),Row(2,"two"))

val rdd=spark.sparkContext.parallelize(data1)

val final_df= spark.createDataFrame(rdd,schemas)`

In other case like below .. i am not able to 在下面的其他情况下..我无法

`val data2=Seq((1,"one"),(2,"two"))

val rdd=spark.sparkContext.parallelize(data2)

val final_df= spark.createDataFrame(rdd,schemas)`

Whats wrong with data2 for not able to become a valid rdd for Dataframe? data2出了什么问题，因为它无法成为Dataframe的有效rdd？

but we can able to create dataframe using toDF() with data2 but not CreateDataFrame. 但是我们可以使用带有数据2的toDF（）创建数据帧，但不能使用CreateDataFrame创建数据帧。

val data2_DF=Seq((1,"one"),(2,"two")).toDF("number", "notation") val data2_DF = Seq（（1，“ one”），（2，“ two”））。toDF（“ number”，“ notation”）

Please help me understand this behaviour. 请帮助我了解这种行为。

Is Row mandatory while creating dataframe? 创建数据框时，Row是强制性的吗？

Answer 1

In the second case, just do : 在第二种情况下，只需执行以下操作：

val final_df = spark.createDataFrame(rdd)

Because your RDD is an RDD of Tuple2 (which is a Product ), the schema is known at compile time, so you don't need to specify a schema 因为您的RDD是Tuple2的RDD（这是一个Product ），所以该架构在编译时是已知的，因此您无需指定架构

spark.createDataFrame（）无法与Seq RDD一起使用

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-18 19:49:55

spark.createDataFrame（）无法与Seq RDD一起使用

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-18 19:49:55

解决方案1
1 已采纳 2019-08-18 19:49:55