簡體   English   中英

如何在 Scala Spark 的 dataframe 中添加一些值?

[英]How to add some values in a dataframe in Scala Spark?

這是我現在的 dataframe,假設總共有 4 天{1,2,3,4}:

+-------------+----------+------+
| key         | Time     | Value|
+-------------+----------+------+
|  1          |        1 | 1    | 
|  1          |        2 | 2    |
|  1          |        4 | 3    |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
+-------------+----------+------+

而我想要的是

+-------------+----------+------+
| key         | Time     | Value|  
+-------------+----------+------+
|  1          |        1 | 1    | 
|  1          |        2 | 2    | 
|  1          |        3 | null | 
|  1          |        4 | 3    | 
|  2          |        1 | null |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
|  2          |        4 | null | 
+-------------+----------+------+

如果有一些方法可以幫助我得到這個?

假設df1是我們的主表:

+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1  |1   |1    |
|1  |2   |2    |
|1  |4   |3    |
|2  |2   |4    |
|2  |3   |5    |
+---+----+-----+

我們可以使用以下轉換:

val data = df1
  // we first group by and aggregate the values to a sequence between 1 and 4 (your number)
  .groupBy("key")
  .agg(sequence(lit(1), lit(4)).as("Time"))
  // we explode the sequence, thus creating all 'Time' per 'key'
  .withColumn("Time", explode(col("Time")))
  // finally, we join with our main table on 'key' and 'Time'
  .join(df1, Seq("key", "Time"), "left")

要獲得此 output:

+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1  |1   |1    |
|1  |2   |2    |
|1  |3   |null |
|1  |4   |3    |
|2  |1   |null |
|2  |2   |4    |
|2  |3   |5    |
|2  |4   |null |
+---+----+-----+

這應該是你要找的,祝你好運!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM