简体   繁体   English

如何在熊猫中进行“(df1&not df2)”数据框合并?

[英]How to do "(df1 & not df2)" dataframe merge in pandas?

I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).我有 2 个 Pandas 数据框 df1 和 df2,它们具有公共列/键(x,y)。

I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.我想合并对键(x,y)进行“(df1&not df2)”类型的合并,这意味着我希望我的代码返回一个数据框,其中包含仅在df1中而不在df2中带有(x,y)的行。

SAS has an equivalent functionality SAS 具有等效的功能

data final;
merge df1(in=a) df2(in=b);
by x y;
if a & not b;
run;

Who to replicate the same functionality in pandas elegantly?谁来优雅地在 pandas 中复制相同的功能? It would have been great if we can specify how="left-right" in merge().如果我们可以在 merge() 中指定 how="left-right" 那就太好了。

I just upgraded to version 0.17.0 RC1 which was released 10 days ago.我刚刚升级到 10 天前发布的 0.17.0 RC1 版本。 Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!刚刚发现 pd.merge() 在这个新版本中有一个名为 indicator=True 的新参数,以潘多尼克的方式实现这一点!!

df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']

indicator: Add a column to the output DataFrame called _merge with information on the source of each row.指标:向输出 DataFrame 添加一个名为 _merge 的列,其中包含有关每行来源的信息。 _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation's merge key is found in both. _merge 是分类类型,对于合并键仅出现在“左”数据帧中的观察,取值 left_only,对于合并键仅出现在“右”数据帧中的观察,如果观察的合并键在两者中都找到,则两者都取值.

http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging

Another way to do this is to use the index.另一种方法是使用索引。

If i1 and i2 are indices (sets of indices) then i1.difference(i2) represent those indices in i1 and not in i2 .如果i1i2是索引(索引集),则i1.difference(i2)表示i1而不是i2中的那些索引。 Then if df is a dataframe indexed by the same index type , for instance i1=df.index then pd.DataFrame(index=i1.difference(i2)).join(df) are those entries in df whose index is not in the index i2 .然后,如果df是由相同索引类型索引的数据帧,例如i1=df.indexpd.DataFrame(index=i1.difference(i2)).join(df)df中索引不在索引i2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:如何正确执行df2中的行= df1中的列? - Pandas: how to do row in df2 = column in df1 properly? 熊猫:如何合并来自df1的'X'和来自df2的f ['Y']的2个数据帧 - pandas: how to merge 2 dataframes on 'X' from df1 and f['Y'] from df2 如何在熊猫数据框中的两行df2中一次插入一行df1 - How to insert a row of df1 one time in two rows of df2 in pandas dataframe 如何合并 df1 & df2 但只保留 df2 的新行 - How to merge df1 & df2 but only keep the new rows of df2 pandas 如何从 df2 获取 df1 的值,而 df1 和 df2 的值在列上重叠 - pandas how to get values from df2 for df1 while df1 and df2 have values overlapped on column(s) 如果 df2 中不存在列,如何将列从 df1 添加到 df2,否则什么也不做 - How to add a column from df1 to df2 if it not present in df2, else do nothing 在DF2列值与DF1索引匹配的pandas DataFrame1中设置新的列值 - Set new column values in pandas DataFrame1 where DF2 column values match DF1 index python中的Pandas数据框:根据df2中的行从df1中删除行 - Pandas dataframe in python: Removing rows from df1 based on rows in df2 使用 timedelta 为 df1 中的每一行保留 df2 中的 Pandas DataFrame 行 - Keep pandas DataFrame rows in df2 for each row in df1 with timedelta 通过比较 df1 和 df2 的内容,从 pandas dataframe 获取切片 - Getting a slice from pandas dataframe by comparing contents of df1 with df2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM