[英]pyspark dataframes join by iterable column
I would like to join two pyspark dataframes based on multiple columns.我想加入两个基于多列的 pyspark 数据框。
tab1:选项卡 1:
id name(string , size=3) val. (Long int)
6725 fnc 5219
8576 fnc 829
9192 sct 72912
782 sct 1022
tab2:选项卡 2:
name (string, size=6). Val. (Array of long int)
fnceda [11, 25, 5219]
fncytfd [71, 829, 320]
sctvbd [357, 72912, 508]
sctgsd [796, 52, 67]
I need to get a new table such that我需要一张新桌子,这样
the “name” in “tab1” match the first 3 letter in “name” of “tab2”
and also the “val” in “tab1” appear in the “val” of “tab2”.
All other rows that do not satisfy the condition need to be removed.
id name(string , size=3) val. (Long int)
6725 fnc 5219
8576 fnc 829
9192 sct 72912
My code:我的代码:
tab1.join(tab2,
tab1[‘’name”]==F.substring(tab2[“name”], 1, 3),
& F.array_contains(tab2[“val”], tab1[“val”]),
“inner”
)
Got error:出现错误:
Column is not iterable
It seems that an array column cannot be used as a join condition?好像不能用数组列作为连接条件?
Thanks谢谢
This can be accomplished in 3 steps.这可以通过 3 个步骤完成。
Step 1: Create a new column in tab2 with by obtaining substring第 1 步:通过获取 substring 在 tab2 中创建一个新列
from pyspark.sql.functions import substring, explode
tab2_df = tab2_df.withColumn('new_name', substring('name', 0, 3))
Step 2: Explode tab2.val so you have long values instead of array of long.第 2 步:展开 tab2.val,这样您就有了长值而不是长数组。
tab2_df = tab2_df.withColumn('value', explode('Val))
Step 3: Perform a join between tab1 and tab2 by comparing name w/ new_name, and val w/ value第 3 步:通过比较 name 与 new_name 和 val 与 value 来执行 tab1 和 tab2 之间的连接
tab3_df = tab1_df.join(tab2_df, [(tab1_df.name == tab2_df.new_name) & (tab1_df.val == tab2_df.value)], how="inner")
display(tab3_df)
You need to wrap your first condition in parentheses, then you'll be fine您需要将第一个条件用括号括起来,然后就可以了
df.join(df2, (df['name']==F.substring(df2['name'], 1, 3)) & F.array_contains(df2['val'], df['val']), 'inner').show()
+----+----+-----+-------+-----------------+
| id|name| val| name| val|
+----+----+-----+-------+-----------------+
|6725| fnc| 5219| fnceda| [11, 25, 5219]|
|8576| fnc| 829|fncytfd| [71, 829, 320]|
|9192| sct|72912| sctvbd|[357, 72912, 508]|
+----+----+-----+-------+-----------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.