简体   繁体   中英

Scala dataframe value of column into another column

I want to assign value of previous id into immediate next id.

For example, "id1" has value is "ab" and "id2" has value "ac".

I want to get the output "id2" has value "ab" "ac".

I have dataframe df as below:

id  value1 
id1  ab   
id1  ab
id2  ac     
id2  ac    
id3  abc    
id3  abc    
id3  abc   

desired output

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    ac

I used the following script

val w1 = Window.orderBy("id")
val snDF = df.withColumn("value2", lag($"value1", 2).over(w1))

But it gives me:

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    abc

It is not the correct ouput. How can I get it ?

Thanks

Doing the following should work for you

import org.apache.spark.sql.expressions._
val w1 = Window.orderBy("id")

import org.apache.spark.sql.functions._
df.groupBy("id", "value1")
    .agg(collect_list("value1").as("temp"))
    .withColumn("value2", lag($"value1", 1).over(w1))
    .withColumn("temp", explode(col("temp")))
    .drop("temp")
  .show(false)

You would get dataframe as

+---+------+------+
|id |value1|value2|
+---+------+------+
|id1|ab    |null  |
|id1|ab    |null  |
|id2|ac    |ab    |
|id2|ac    |ab    |
|id3|abc   |ac    |
|id3|abc   |ac    |
|id3|abc   |ac    |
+---+------+------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM