I'm trying to create a dataframe to feed to a function as part of my unit tests. If I have the following
val myDf = sparkSession.sqlContext.createDataFrame(
sparkSession.sparkContext.parallelize(Seq(
Row(Some(Seq(MyObject(1024, 100001D), MyObject(1, -1D)))))),
StructType(List(
StructField("myList", ArrayType[???], true)
)))
MyObject is a case class.
I don't know what to put for the object type. Any suggestions? I've tried ArrayType of pretty much every combination I can think of.
I'm looking for a dataframe that looks something like:
+--------------------+
| myList |
+--------------------+
| [1024, 100001] |
| [1, -1] |
+--------------------+
Coming in the reverse way...
val s = Seq(Array(1024, 100001D), Array(1, -1D)).toDS().toDF("myList")
println(s.schema)
s.printSchema
s.show
Your schema is like below... DoubleType
is coming since these 100001D and -1D are double.
StructType(StructField(myList,ArrayType(DoubleType,false),true))
Output you needed:
root
|-- myList: array (nullable = true)
| |-- element: double (containsNull = false)
+------------------+
| myList|
+------------------+
|[1024.0, 100001.0]|
| [1.0, -1.0]|
+------------------+
Or this way also you can do that.
case class MyObject(a:Int , b:Double)
val s = Seq(MyObject(1024, 100001D), MyObject(1, -1D)).toDS()
.select(struct($"a",$"b").as[MyObject] as "myList")
println(s.schema)
s.printSchema
s.show
Result:
//schema :
StructType(StructField(myList,StructType(StructField(a,IntegerType,false), StructField(b,DoubleType,false)),false))
root
|-- myList: struct (nullable = false)
| |-- a: integer (nullable = false)
| |-- b: double (nullable = false)
+----------------+
| myList|
+----------------+
|[1024, 100001.0]|
| [1, -1.0]|
+----------------+
Try this
scala> case class MyObject(prop1:Int, prop2:Double)
defined class MyObject
scala> val df = Seq((1024, 100001D), (1, -1D)).toDF("prop1","prop2").select(struct($"prop1",$"prop2").as[MyObject] as "myList")
df: org.apache.spark.sql.DataFrame = [myList: struct<prop1: int, prop2: double>]
scala> df.show(false)
+----------------+
|myList |
+----------------+
|[1024, 100001.0]|
|[1, -1.0] |
+----------------+
scala> df.printSchema
root
|-- myList: struct (nullable = false)
| |-- prop1: integer (nullable = false)
| |-- prop2: double (nullable = false)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.