检查一个 dataframe 中的列对是否存在于另一个中？

Question

d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana']}
df1 = pd.DataFrame(d1)

d2 = {'id': ['a','b','d'], 'ref': ['apple','orange','banana']}
df2 = pd.DataFrame(d2)

我想看看df1中的id和ref的列对是否存在于df2中。 我想在 df2 中创建一个 boolean 列来完成此操作。

所需的 Output：

d3 = {'id': ['a','b','d'], 'ref': ['apple','orange','banana'], 'check':[True,True,False]}
df2 = pd.DataFrame(d3)

我已经尝试了以下以及简单的分配/isin

df2['check'] = df2[['id','ref']].isin(df1[['id','ref']].values.ravel()).any(axis=1)

df2['check'] = df2.apply(lambda x: x.isin(df1.stack())).any(axis=1)

我怎么能在没有合并的情况下做到这一点？

Answer 1

我不确定您为什么不喜欢合并，但是您可以将isin与tuple一起使用：

df2['check'] = df2[['id','ref']].apply(tuple, axis=1)\
                  .isin(df1[['id','ref']].apply(tuple, axis=1))

Output：

  id     ref  check
0  a   apple   True
1  b  orange   True
2  d  banana  False

Answer 2

我想这就是你要找的：

d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana']}
df1 = pd.DataFrame(d1)

d2 = {'id': ['a','b','d'], 'ref': ['apple','orange','banana']}
df2 = pd.DataFrame(d2)

result =  df1.loc[df1.id.isin(df2.id) & df2.ref.isin(df2.ref)]

尽管合并几乎肯定会更有效：

#create a compound key with id + ref
df1["key"] = df1.apply(lambda row: f'{row["id"]}_{row["ref"]}', axis=1)
df2["key"] = df2.apply(lambda row: f'{row["id"]}_{row["ref"]}', axis=1)
#merge df2 on df1 on compound key
df3 =  df1.merge(df2, on="key")
#locate the matched keys in df1
result = df1.set_index("id").loc[df3.id_x]

检查一个 dataframe 中的列对是否存在于另一个中？

问题描述

2 个解决方案

解决方案1
0 2021-12-14 20:43:03

解决方案2
0 2021-12-14 20:50:34

检查一个 dataframe 中的列对是否存在于另一个中？

问题描述

2 个解决方案

解决方案1 0 2021-12-14 20:43:03

解决方案2 0 2021-12-14 20:50:34

解决方案1
0 2021-12-14 20:43:03

解决方案2
0 2021-12-14 20:50:34