简体   繁体   English

将列的Scala数据框值转换为另一列

[英]Scala dataframe value of column into another column

I want to assign value of previous id into immediate next id. 我想将前一个ID的值分配给下一个ID。

For example, "id1" has value is "ab" and "id2" has value "ac". 例如,“ id1”的值为“ ab”,“ id2”的值为“ ac”。

I want to get the output "id2" has value "ab" "ac". 我想获取输出“ id2”,其值为“ ab”“ ac”。

I have dataframe df as below: 我有如下数据框df:

id  value1 
id1  ab   
id1  ab
id2  ac     
id2  ac    
id3  abc    
id3  abc    
id3  abc   

desired output 期望的输出

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    ac

I used the following script 我使用以下脚本

val w1 = Window.orderBy("id")
val snDF = df.withColumn("value2", lag($"value1", 2).over(w1))

But it gives me: 但这给了我:

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    abc

It is not the correct ouput. 这不是正确的输出。 How can I get it ? 我怎么才能得到它 ?

Thanks 谢谢

Doing the following should work for you 请执行以下操作为您工作

import org.apache.spark.sql.expressions._
val w1 = Window.orderBy("id")

import org.apache.spark.sql.functions._
df.groupBy("id", "value1")
    .agg(collect_list("value1").as("temp"))
    .withColumn("value2", lag($"value1", 1).over(w1))
    .withColumn("temp", explode(col("temp")))
    .drop("temp")
  .show(false)

You would get dataframe as 您将获得数据框为

+---+------+------+
|id |value1|value2|
+---+------+------+
|id1|ab    |null  |
|id1|ab    |null  |
|id2|ac    |ab    |
|id2|ac    |ab    |
|id3|abc   |ac    |
|id3|abc   |ac    |
|id3|abc   |ac    |
+---+------+------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM