如何在熊猫中进行“（df1＆not df2）”数据框合并？

Question

I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).我有 2 个 Pandas 数据框 df1 和 df2，它们具有公共列/键（x，y）。

I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.我想合并对键（x，y）进行“（df1＆not df2）”类型的合并，这意味着我希望我的代码返回一个数据框，其中包含仅在df1中而不在df2中带有（x，y）的行。

SAS has an equivalent functionality SAS 具有等效的功能

data final;
merge df1(in=a) df2(in=b);
by x y;
if a & not b;
run;

Who to replicate the same functionality in pandas elegantly?谁来优雅地在 pandas 中复制相同的功能？ It would have been great if we can specify how="left-right" in merge().如果我们可以在 merge() 中指定 how="left-right" 那就太好了。

Answer 1

I just upgraded to version 0.17.0 RC1 which was released 10 days ago.我刚刚升级到 10 天前发布的 0.17.0 RC1 版本。 Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!刚刚发现 pd.merge() 在这个新版本中有一个名为 indicator=True 的新参数，以潘多尼克的方式实现这一点！！

df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']

indicator: Add a column to the output DataFrame called _merge with information on the source of each row.指标：向输出 DataFrame 添加一个名为 _merge 的列，其中包含有关每行来源的信息。 _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation's merge key is found in both. _merge 是分类类型，对于合并键仅出现在“左”数据帧中的观察，取值 left_only，对于合并键仅出现在“右”数据帧中的观察，如果观察的合并键在两者中都找到，则两者都取值.

http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging

Answer 2

Another way to do this is to use the index.另一种方法是使用索引。

If i1 and i2 are indices (sets of indices) then i1.difference(i2) represent those indices in i1 and not in i2 .如果i1和i2是索引（索引集），则i1.difference(i2)表示i1而不是i2中的那些索引。 Then if df is a dataframe indexed by the same index type , for instance i1=df.index then pd.DataFrame(index=i1.difference(i2)).join(df) are those entries in df whose index is not in the index i2 .然后，如果df是由相同索引类型索引的数据帧，例如i1=df.index则pd.DataFrame(index=i1.difference(i2)).join(df)是df中索引不在索引i2 。

如何在熊猫中进行“（df1＆not df2）”数据框合并？

问题描述

2 个解决方案

解决方案1
30 已采纳 2015-09-20 05:32:24

解决方案2
0 2022-05-12 11:50:26

如何在熊猫中进行“（df1＆not df2）”数据框合并？

问题描述

2 个解决方案

解决方案1 30 已采纳 2015-09-20 05:32:24

解决方案2 0 2022-05-12 11:50:26

解决方案1
30 已采纳 2015-09-20 05:32:24

解决方案2
0 2022-05-12 11:50:26