将列的Scala数据框值转换为另一列

Question

I want to assign value of previous id into immediate next id. 我想将前一个ID的值分配给下一个ID。

For example, "id1" has value is "ab" and "id2" has value "ac". 例如，“ id1”的值为“ ab”，“ id2”的值为“ ac”。

I want to get the output "id2" has value "ab" "ac". 我想获取输出“ id2”，其值为“ ab”“ ac”。

I have dataframe df as below: 我有如下数据框df：

id  value1 
id1  ab   
id1  ab
id2  ac     
id2  ac    
id3  abc    
id3  abc    
id3  abc

desired output 期望的输出

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    ac

I used the following script 我使用以下脚本

val w1 = Window.orderBy("id")
val snDF = df.withColumn("value2", lag($"value1", 2).over(w1))

But it gives me: 但这给了我：

id  value1 value2
id1  ab   
id1  ab
id2  ac     ab
id2  ac     ab
id3  abc    ac
id3  abc    ac
id3  abc    abc

It is not the correct ouput. 这不是正确的输出。 How can I get it ? 我怎么才能得到它？

Thanks 谢谢

Answer 1

Doing the following should work for you 请执行以下操作为您工作

import org.apache.spark.sql.expressions._
val w1 = Window.orderBy("id")

import org.apache.spark.sql.functions._
df.groupBy("id", "value1")
    .agg(collect_list("value1").as("temp"))
    .withColumn("value2", lag($"value1", 1).over(w1))
    .withColumn("temp", explode(col("temp")))
    .drop("temp")
  .show(false)

You would get dataframe as 您将获得数据框为

+---+------+------+
|id |value1|value2|
+---+------+------+
|id1|ab    |null  |
|id1|ab    |null  |
|id2|ac    |ab    |
|id2|ac    |ab    |
|id3|abc   |ac    |
|id3|abc   |ac    |
|id3|abc   |ac    |
+---+------+------+

将列的Scala数据框值转换为另一列

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-04-05 15:57:04

将列的Scala数据框值转换为另一列

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-04-05 15:57:04

解决方案1
0 已采纳 2018-04-05 15:57:04