Scala dataframe value of column into another column

Question

I want to assign value of previous id into immediate next id.

For example, "id1" has value is "ab" and "id2" has value "ac".

I want to get the output "id2" has value "ab" "ac".

I have dataframe df as below:

id  value1 
id1  ab   
id1  ab
id2  ac     
id2  ac    
id3  abc    
id3  abc    
id3  abc

desired output

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    ac

I used the following script

val w1 = Window.orderBy("id")
val snDF = df.withColumn("value2", lag($"value1", 2).over(w1))

But it gives me:

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    abc

It is not the correct ouput. How can I get it ?

Thanks

Answer 1

Doing the following should work for you

import org.apache.spark.sql.expressions._
val w1 = Window.orderBy("id")

import org.apache.spark.sql.functions._
df.groupBy("id", "value1")
    .agg(collect_list("value1").as("temp"))
    .withColumn("value2", lag($"value1", 1).over(w1))
    .withColumn("temp", explode(col("temp")))
    .drop("temp")
  .show(false)

You would get dataframe as

+---+------+------+
|id |value1|value2|
+---+------+------+
|id1|ab    |null  |
|id1|ab    |null  |
|id2|ac    |ab    |
|id2|ac    |ab    |
|id3|abc   |ac    |
|id3|abc   |ac    |
|id3|abc   |ac    |
+---+------+------+

Scala dataframe value of column into another column

Question

1 answers

solution1
0 ACCPTED 2018-04-05 15:57:04

Scala dataframe value of column into another column

Question

1 answers

solution1 0 ACCPTED 2018-04-05 15:57:04

solution1
0 ACCPTED 2018-04-05 15:57:04