繁体   English   中英

如何在 Scala Spark 的 dataframe 中添加一些值?

[英]How to add some values in a dataframe in Scala Spark?

这是我现在的 dataframe,假设总共有 4 天{1,2,3,4}:

+-------------+----------+------+
| key         | Time     | Value|
+-------------+----------+------+
|  1          |        1 | 1    | 
|  1          |        2 | 2    |
|  1          |        4 | 3    |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
+-------------+----------+------+

而我想要的是

+-------------+----------+------+
| key         | Time     | Value|  
+-------------+----------+------+
|  1          |        1 | 1    | 
|  1          |        2 | 2    | 
|  1          |        3 | null | 
|  1          |        4 | 3    | 
|  2          |        1 | null |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
|  2          |        4 | null | 
+-------------+----------+------+

如果有一些方法可以帮助我得到这个?

假设df1是我们的主表:

+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1  |1   |1    |
|1  |2   |2    |
|1  |4   |3    |
|2  |2   |4    |
|2  |3   |5    |
+---+----+-----+

我们可以使用以下转换:

val data = df1
  // we first group by and aggregate the values to a sequence between 1 and 4 (your number)
  .groupBy("key")
  .agg(sequence(lit(1), lit(4)).as("Time"))
  // we explode the sequence, thus creating all 'Time' per 'key'
  .withColumn("Time", explode(col("Time")))
  // finally, we join with our main table on 'key' and 'Time'
  .join(df1, Seq("key", "Time"), "left")

要获得此 output:

+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1  |1   |1    |
|1  |2   |2    |
|1  |3   |null |
|1  |4   |3    |
|2  |1   |null |
|2  |2   |4    |
|2  |3   |5    |
|2  |4   |null |
+---+----+-----+

这应该是你要找的,祝你好运!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM