簡體   English   中英

從火花數據幀中的不同 ROW 獲取值

[英]Fetching value from a different ROW in a spark dataframe

我有下面的 DF

+------+------+----+
|  Year|    PY| VAL|
+------+------+----+
|202005|201905|2005|
|202006|201906|2006|
|202007|201907|2007|
|201905|201805|1905|
|201906|201806|1906|
|201907|201807|1907|
|201805|201705|1805|
|201806|201706|1806|
|201807|201707|1807|
+------+------+----+

通過獲得

val df1=Seq(
("202005","201905","2005"),
("202006","201906","2006"),
("202007","201907","2007"),
("201905","201805","1905"),
("201906","201806","1906"),
("201907","201807","1907"),
("201805","201705","1805"),
("201806","201706","1806"),
("201807","201707","1807")
)toDF("Year","PY","VAL")

我想在單獨的列中填充上一年的值(VAL_PY)。 該值實際上位於同一 DF 的不同行中。

另外,我想以分布式方式實現這一點,因為我的 DF 很大(> 1000 萬條記錄)

預期產出——

+------+------+----+-------+
|  Year|    PY| VAL| VAL_PY|
+------+------+----+-------+
|202005|201905|2005|1905   |
|202006|201906|2006|1906   |
|202007|201907|2007|1907   |
|201905|201805|1905|1805   |
|201906|201806|1906|1806   |
|201907|201807|1907|1807   |
|201805|201705|1805|null   |
|201806|201706|1806|null   |
|201807|201707|1807|null   |
+------+------+----+-------+
val df1=Seq(("202005","201905","2005"),("202006","201906","2006"),("202007","201907","2007"),("201905","201805","1905"),("201906","201806","1906"),("201907","201807","1907"),("201805","201705","1805"),("201806","201706","1806"),("201807","201707","1807")
)toDF("Year","PY","VAL")

val df2 = df1
.drop("PY")
.withColumnRenamed("VAL","VAL_PY")
.withColumnRenamed("Year","PY")

df1.join(df2, Seq("PY"),"left")
.select("Year","PY","VAL","VAL_PY").show

輸出 :

+------+------+----+------+
|  Year|    PY| VAL|VAL_PY|
+------+------+----+------+
|202005|201905|2005|  1905|
|202006|201906|2006|  1906|
|202007|201907|2007|  1907|
|201905|201805|1905|  1805|
|201906|201806|1906|  1806|
|201907|201807|1907|  1807|
|201805|201705|1805|  null|
|201806|201706|1806|  null|
|201807|201707|1807|  null|
+------+------+----+------+

看起來像一個左自連接。 如果我遺漏了什么,請告訴我。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM