![](/img/trans.png)
[英]Merge two dataframe if one string column is contained in another column in Pandas
[英]How to merge two dataframe in pyspark when one column is an array and another column is string?
df1:
+---+------+
| id| code|
+---+------+
| 1|[A, F]|
| 2| [G]|
| 3| [A]|
+---+------+
df2:
+--------+----+
| col1|col2|
+--------+----+
| Apple| A|
| Google| G|
|Facebook| F|
+--------+----+
我希望 df3 通過使用 df1 和 df2 列應該是這樣的:
+---+------+-----------------+
| id| code| changed|
+---+------+-----------------+
| 1|[A, F]|[Apple, Facebook]|
| 2| [G]| [Google]|
| 3| [A]| [Apple]|
+---+------+-----------------+
我知道如果代碼列不是數組,這可以存檔。 我不知道如何為此目的迭代代碼數組。
嘗試:
from pyspark.sql.functions import *
import pyspark.sql.functions as f
res=(df1
.select(f.col("id"), f.explode(f.col("code")).alias("code"))
.join(df2, f.col("code")==df2.col2)
.groupBy("id")
.agg(f.collect_list(f.col("code")).alias("code"), f.collect_list(f.col("col1")).alias("changed"))
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.