I have a 2d list of the following format with the name tuppleSlides:
List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))
I have created the following schema:
val schema = StructType(
Array(
StructField("1", IntegerType, true),
StructField("2", IntegerType, true),
StructField("3", IntegerType, true),
StructField("4", IntegerType, true),
StructField("5", IntegerType, true),
StructField("6", IntegerType, true),
StructField("7", IntegerType, true),
StructField("8", IntegerType, true),
StructField("9", IntegerType, true),
StructField("10", IntegerType, true) )
)
and I am creating a dataframe like so:
val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)
but it won't even compile. How am I suppose to do it properly?
Thank you.
You need to convert the 2d list to a RDD[Row] object before creating a data frame:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))
sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]
Also note in spark 2.x, sqlContext is replaced with spark :
spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.