将2D列表保存到数据框Scala Spark中

Question

I have a 2d list of the following format with the name tuppleSlides: 我有以下格式的二维列表，名称为tuppleSlides：

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))

I have created the following schema: 我创建了以下架构：

val schema = StructType(
            Array(
            StructField("1", IntegerType, true), 
            StructField("2", IntegerType, true), 
            StructField("3", IntegerType, true), 
            StructField("4", IntegerType, true),  
            StructField("5", IntegerType, true), 
            StructField("6", IntegerType, true), 
            StructField("7", IntegerType, true), 
            StructField("8", IntegerType, true), 
            StructField("9", IntegerType, true), 
            StructField("10", IntegerType, true) )
        )

and I am creating a dataframe like so: 我正在创建一个像这样的数据框：

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)

but it won't even compile. 但它甚至不会编译。 How am I suppose to do it properly? 我应该如何正确地做？

Thank you. 谢谢。

Answer 1

You need to convert the 2d list to a RDD[Row] object before creating a data frame: 在创建数据框之前，您需要将2d列表转换为RDD [Row]对象：

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))

sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]

Also note in spark 2.x, sqlContext is replaced with spark : 还要注意在spark 2.x中， sqlContext被spark替换了：

spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]

将2D列表保存到数据框Scala Spark中

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-12-15 19:23:58

将2D列表保存到数据框Scala Spark中

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-12-15 19:23:58

解决方案1
4 已采纳 2016-12-15 19:23:58