[英]Scala dataframe value of column into another column
I want to assign value of previous id into immediate next id. 我想将前一个ID的值分配给下一个ID。
For example, "id1" has value is "ab" and "id2" has value "ac". 例如,“ id1”的值为“ ab”,“ id2”的值为“ ac”。
I want to get the output "id2" has value "ab" "ac". 我想获取输出“ id2”,其值为“ ab”“ ac”。
I have dataframe df as below: 我有如下数据框df:
id value1
id1 ab
id1 ab
id2 ac
id2 ac
id3 abc
id3 abc
id3 abc
desired output 期望的输出
id value1 value2
id1 ab
id1 ab
id2 ac ab
id2 ac ab
id3 abc ac
id3 abc ac
id3 abc ac
I used the following script 我使用以下脚本
val w1 = Window.orderBy("id")
val snDF = df.withColumn("value2", lag($"value1", 2).over(w1))
But it gives me: 但这给了我:
id value1 value2
id1 ab
id1 ab
id2 ac ab
id2 ac ab
id3 abc ac
id3 abc ac
id3 abc abc
It is not the correct ouput. 这不是正确的输出。 How can I get it ?
我怎么才能得到它 ?
Thanks 谢谢
Doing the following should work for you 请执行以下操作为您工作
import org.apache.spark.sql.expressions._
val w1 = Window.orderBy("id")
import org.apache.spark.sql.functions._
df.groupBy("id", "value1")
.agg(collect_list("value1").as("temp"))
.withColumn("value2", lag($"value1", 1).over(w1))
.withColumn("temp", explode(col("temp")))
.drop("temp")
.show(false)
You would get dataframe as 您将获得数据框为
+---+------+------+
|id |value1|value2|
+---+------+------+
|id1|ab |null |
|id1|ab |null |
|id2|ac |ab |
|id2|ac |ab |
|id3|abc |ac |
|id3|abc |ac |
|id3|abc |ac |
+---+------+------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.