[英]Python how to join/merge Pandas dataframes with matching columns of specific values from different dataframes
I have two different datasets a and b.我有两个不同的数据集 a 和 b。 I want to left join b to a but I want to join to a where only left join b['ColA'] and b['ColC'] to matching a['ColA'] and a['ColC']==1我想将 b 加入 a 但我想加入 a,其中只有left join b['ColA'] 和 b['ColC'] 匹配 a['ColA'] 和 a['ColC']==1
something like expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])
类似于expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])
a = pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
"ColB":[5,6,7],
"ColC":[1,1,0]})
b = pd.DataFrame({"ColA":["num 1", "num 2", "num 4"],
"Colx":[10,16,71],
"Coly":[0,0,0]})
Coly
is all equal 0 Coly
都等于 0
expected= pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
"ColB":[5,6,7],
"ColC":[1,1,0],
"Colx":[10,16,None]})```
I solve it by creating a new column on b table that matches same value with a['colx']
.我通过在 b 表上创建一个与a['colx']
匹配相同值的新列来解决它。
But I wonder if there is a way to let you use conditions in merge/join process like in sql.但我想知道是否有一种方法可以让您在合并/加入过程中使用条件,例如在 sql 中。
There is no feature in Pandas to directly use conditions in merge/join process like in sql. Pandas 中没有像 sql 那样在合并/加入过程中直接使用条件的功能。 Anyway, we can simulate this by chaining the Pandas .merge()
function and perform the filtering by .query()
which has syntax like sql where condition syntax.无论如何,我们可以通过链接 Pandas .merge()
函数来模拟这一点,并通过.query()
执行过滤,它的语法类似于 sql where 条件语法。
To do this, you can do a left join on a
and b
on matching ColA
and set indicator=True
for us to distinguish whether the merged row entry is from a
only or from merging both a
and b
.为此,您可以在匹配ColA
对a
和b
进行左连接,并为我们设置indicator=True
以区分合并的行条目是仅来自a
还是来自合并a
和b
。
Then, use .query()
to filter on the required condition that if merging from both, ColC == 1
and Coly == 0
.然后,使用.query()
过滤所需的条件,如果从两者合并, ColC == 1
和Coly == 0
。 Otherwise, if only from a
, we keep the row.否则,如果仅来自a
,我们将保留该行。
df_out = (pd.merge(a, b, left_on='ColA', right_on ='ColA', how='left', indicator=True)
.query('(_merge == "left_only") | ((ColC == 1) & (Coly == 0))')
)
Result:结果:
print(df_out)
ColA ColB ColC Colx Coly _merge
0 num 1 5 1 10.0 0.0 both
1 num 2 6 1 16.0 0.0 both
2 num 3 7 0 NaN NaN left_only
Then, we can drop the unwanted columns by .drop
, as follows:然后,我们可以通过.drop
删除不需要的列,如下所示:
df_out = df_out.drop(['Coly', '_merge'], axis=1)
Result:结果:
print(df_out)
ColA ColB ColC Colx
0 num 1 5 1 10.0
1 num 2 6 1 16.0
2 num 3 7 0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.