簡體   English   中英

將2D列表保存到數據框Scala Spark中

[英]Save a 2d list into a dataframe scala spark

我有以下格式的二維列表,名稱為tuppleSlides:

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))

我創建了以下架構:

val schema = StructType(
            Array(
            StructField("1", IntegerType, true), 
            StructField("2", IntegerType, true), 
            StructField("3", IntegerType, true), 
            StructField("4", IntegerType, true),  
            StructField("5", IntegerType, true), 
            StructField("6", IntegerType, true), 
            StructField("7", IntegerType, true), 
            StructField("8", IntegerType, true), 
            StructField("9", IntegerType, true), 
            StructField("10", IntegerType, true) )
        )

我正在創建一個像這樣的數據框:

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)

但它甚至不會編譯。 我應該如何正確地做?

謝謝。

在創建數據框之前,您需要將2d列表轉換為RDD [Row]對象:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))

sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]

還要注意在spark 2.x中, sqlContextspark替換了:

spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM