[英]How to do "(df1 & not df2)" dataframe merge in pandas?
I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).我有 2 个 Pandas 数据框 df1 和 df2,它们具有公共列/键(x,y)。
I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.我想合并对键(x,y)进行“(df1&not df2)”类型的合并,这意味着我希望我的代码返回一个数据框,其中包含仅在df1中而不在df2中带有(x,y)的行。
SAS has an equivalent functionality SAS 具有等效的功能
data final;
merge df1(in=a) df2(in=b);
by x y;
if a & not b;
run;
Who to replicate the same functionality in pandas elegantly?谁来优雅地在 pandas 中复制相同的功能? It would have been great if we can specify how="left-right" in merge().如果我们可以在 merge() 中指定 how="left-right" 那就太好了。
I just upgraded to version 0.17.0 RC1 which was released 10 days ago.我刚刚升级到 10 天前发布的 0.17.0 RC1 版本。 Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!刚刚发现 pd.merge() 在这个新版本中有一个名为 indicator=True 的新参数,以潘多尼克的方式实现这一点!!
df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']
indicator: Add a column to the output DataFrame called _merge with information on the source of each row.指标:向输出 DataFrame 添加一个名为 _merge 的列,其中包含有关每行来源的信息。 _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation's merge key is found in both. _merge 是分类类型,对于合并键仅出现在“左”数据帧中的观察,取值 left_only,对于合并键仅出现在“右”数据帧中的观察,如果观察的合并键在两者中都找到,则两者都取值.
http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging
Another way to do this is to use the index.另一种方法是使用索引。
If i1
and i2
are indices (sets of indices) then i1.difference(i2)
represent those indices in i1
and not in i2
.如果i1
和i2
是索引(索引集),则i1.difference(i2)
表示i1
而不是i2
中的那些索引。 Then if df
is a dataframe indexed by the same index type , for instance i1=df.index
then pd.DataFrame(index=i1.difference(i2)).join(df)
are those entries in df
whose index is not in the index i2
.然后,如果df
是由相同索引类型索引的数据帧,例如i1=df.index
则pd.DataFrame(index=i1.difference(i2)).join(df)
是df
中索引不在索引i2
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.