![](/img/trans.png)
[英]Add values to a dataframe against some particular ID in Spark Scala
[英]How to add some values in a dataframe in Scala Spark?
这是我现在的 dataframe,假设总共有 4 天{1,2,3,4}:
+-------------+----------+------+
| key | Time | Value|
+-------------+----------+------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 3 |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
+-------------+----------+------+
而我想要的是
+-------------+----------+------+
| key | Time | Value|
+-------------+----------+------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | null |
| 1 | 4 | 3 |
| 2 | 1 | null |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | null |
+-------------+----------+------+
如果有一些方法可以帮助我得到这个?
假设df1
是我们的主表:
+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1 |1 |1 |
|1 |2 |2 |
|1 |4 |3 |
|2 |2 |4 |
|2 |3 |5 |
+---+----+-----+
我们可以使用以下转换:
val data = df1
// we first group by and aggregate the values to a sequence between 1 and 4 (your number)
.groupBy("key")
.agg(sequence(lit(1), lit(4)).as("Time"))
// we explode the sequence, thus creating all 'Time' per 'key'
.withColumn("Time", explode(col("Time")))
// finally, we join with our main table on 'key' and 'Time'
.join(df1, Seq("key", "Time"), "left")
要获得此 output:
+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1 |1 |1 |
|1 |2 |2 |
|1 |3 |null |
|1 |4 |3 |
|2 |1 |null |
|2 |2 |4 |
|2 |3 |5 |
|2 |4 |null |
+---+----+-----+
这应该是你要找的,祝你好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.