Python 如何加入/合并 Pandas 数据帧与来自不同数据帧的特定值的匹配列

Question

I have two different datasets a and b.我有两个不同的数据集 a 和 b。 I want to left join b to a but I want to join to a where only left join b['ColA'] and b['ColC'] to matching a['ColA'] and a['ColC']==1我想将 b 加入 a 但我想加入 a，其中只有left join b['ColA'] 和 b['ColC'] 匹配 a['ColA'] 和 a['ColC']==1

something like expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])类似于expected_table = pd.merge(a,b, left_on=['ColA', ['ColC']==1 ] ,rigth_on = ['ColA',['ColC']==0])

a =  pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0]})

b =  pd.DataFrame({"ColA":["num 1", "num 2", "num 4"],
                   "Colx":[10,16,71],
                   "Coly":[0,0,0]})

Coly is all equal 0 Coly都等于 0

expected= pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7],
                   "ColC":[1,1,0], 
                   "Colx":[10,16,None]})```

I solve it by creating a new column on b table that matches same value with a['colx'] .我通过在 b 表上创建一个与a['colx']匹配相同值的新列来解决它。

But I wonder if there is a way to let you use conditions in merge/join process like in sql.但我想知道是否有一种方法可以让您在合并/加入过程中使用条件，例如在 sql 中。

Answer 1

There is no feature in Pandas to directly use conditions in merge/join process like in sql. Pandas 中没有像 sql 那样在合并/加入过程中直接使用条件的功能。 Anyway, we can simulate this by chaining the Pandas .merge() function and perform the filtering by .query() which has syntax like sql where condition syntax.无论如何，我们可以通过链接 Pandas .merge()函数来模拟这一点，并通过.query()执行过滤，它的语法类似于 sql where 条件语法。

To do this, you can do a left join on a and b on matching ColA and set indicator=True for us to distinguish whether the merged row entry is from a only or from merging both a and b .为此，您可以在匹配ColA对a和b进行左连接，并为我们设置indicator=True以区分合并的行条目是仅来自a还是来自合并a和b 。

Then, use .query() to filter on the required condition that if merging from both, ColC == 1 and Coly == 0 .然后，使用.query()过滤所需的条件，如果从两者合并， ColC == 1和Coly == 0 。 Otherwise, if only from a , we keep the row.否则，如果仅来自a ，我们将保留该行。

df_out = (pd.merge(a, b, left_on='ColA', right_on ='ColA', how='left', indicator=True)
            .query('(_merge == "left_only") | ((ColC == 1) & (Coly == 0))')
         )

Result:结果：

print(df_out)


    ColA  ColB  ColC  Colx  Coly     _merge
0  num 1     5     1  10.0   0.0       both
1  num 2     6     1  16.0   0.0       both
2  num 3     7     0   NaN   NaN  left_only

Then, we can drop the unwanted columns by .drop , as follows:然后，我们可以通过.drop删除不需要的列，如下所示：

df_out = df_out.drop(['Coly', '_merge'], axis=1)

Result:结果：

print(df_out)

    ColA  ColB  ColC  Colx
0  num 1     5     1  10.0
1  num 2     6     1  16.0
2  num 3     7     0   NaN

Python 如何加入/合并 Pandas 数据帧与来自不同数据帧的特定值的匹配列

问题描述

1 个解决方案

解决方案1
2 2021-09-30 16:01:30

Python 如何加入/合并 Pandas 数据帧与来自不同数据帧的特定值的匹配列

问题描述

1 个解决方案

解决方案1 2 2021-09-30 16:01:30

解决方案1
2 2021-09-30 16:01:30