[英]Transpose DataFrame single row to column in Spark with scala
[英]Spark Scala Dataframe Transform of Nested Maps into a Single Dataframe Row?
我想編寫一個嵌套數據結構,將包含嵌套映射和簡單值的 dataframe 轉換為包含在數組中的單個 dataframe 行。
結果應轉換此 dataframe:
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |records |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|123 |[USA -> [1475600496 -> 25.000000000000000000], ITA -> [1475600500 -> 18.000000000000000000, 1475600516 -> 19.000000000000000000], JPN -> [1475600508 -> 27.000000000000000000]]|
|256 |[USA -> [1475600508 -> 40.000000000000000000, 1475600500 -> 47.000000000000000000], NOR -> [1475600496 -> 30.000000000000000000]] |
|118 |[USA -> [1475600500 -> 50.000000000000000000]] |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
進入:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|valueAndRecords |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[123, [USA -> [1475600496 -> 25.000000000000000000], ITA -> [1475600500 -> 18.000000000000000000, 1475600516 -> 19.000000000000000000], JPN -> [1475600508 -> 27.000000000000000000]], [256, [USA -> [1475600508 -> 40.000000000000000000, 1475600500 -> 47.000000000000000000], NOR -> [1475600496 -> 30.000000000000000000]]], [118, [USA -> [1475600500 -> 50.000000000000000000]]]]|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我可以將兩列與下面的行組合成一個結構,但它不會將結果包裝在數組中。 這怎么能完成?
df.withColumn("valueAndRecords", struct("value", "records")).select("valueAndRecords")
將struct
包裝在array
中,即array(struct(...))
df.withColumn("valueAndRecords", array(struct("value", "records"))).select("valueAndRecords")
(或者)
使用collect_list:
//sample data PS I don't have all data for records column
df.show(false)
//+-----+----------------------------------------------+
//|value|records |
//+-----+----------------------------------------------+
//|123 |[USA -> [1475600496 -> 25.000000000000000000]]|
//|256 |[USA -> [1475600508 -> 40.000000000000000000]]|
//|256 |[USA -> [1475600500 -> 50.000000000000000000]]|
//+-----+----------------------------------------------+
df.groupBy(lit("1")).agg(colle_list(struct("value", "records")).alias("valueAndRecords")).select("valueAndRecords").show()
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|valueAndRecords |
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|[[123, [USA -> [1475600496 -> 25.000000000000000000]]], [256, [USA -> [1475600508 -> 40.000000000000000000]]], [256, [USA -> [1475600500 -> 50.000000000000000000]]]]|
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.