如何将 dataframe 与另一个 df 进行比较，并使用 pandas 从第一个 df 返回新行

Question

I am trying to ascertain if values in a test dataframe ( df2 ) are not appearing in another DF ( df1 ).我试图确定测试dataframe ( df2 ) 中的值是否未出现在另一个 DF ( df1 ) 中。 The following are the two DFs:下面是两个DF：

df1 created from the following source: df1从以下来源创建：

field1字段1	field2字段2
AG股份公司	Agree同意
SA SA	Somewhat Agree有点同意
DG总司	Disagree不同意
SD标清	Somewhat Disagree不太同意
NO不	None没有任何

df2 created from the following source: df2从以下来源创建：

field1字段1	field2字段2
CA加州	California加州
TX TX	Texas得克萨斯州
NO不	None没有任何
NY纽约	New York纽约

Using Method 1 (see below), I am getting the expected result, which is:使用方法 1 （见下文），我得到了预期的结果，即：

Method 1方法一

diff_df = df2[~(df2[field1].isin(df1[field1]) & df2[field2].isin(df1[field2]))].reset_index(drop=True)

This gives me the folllowing expected result:这给了我以下预期结果：

  field1               field2
0     CA           California
1     TX                Texas
2     NY             New York

Note: The duplicate value in df2 ( NO: None ) gets dropped, too.注意： df2 ( NO: None ) 中的重复值也会被删除。

However, there is one problem that I am facing: There can be situations when there are different set of fields that will need to be compared (eg. there may be a third field field3 in the equation).但是，我面临一个问题：在某些情况下，需要比较一组不同的字段（例如，等式中可能有第三个字段field3 ）。

From case to case basis, the number of fields would vary greatly over which the user won't have control.视具体情况而定，用户无法控制的字段数量会有很大差异。

My problem: How do I modify my query so that by comparing the two dataframes I get the expected result?我的问题：如何修改我的查询以便通过比较两个数据帧得到预期的结果？

In the situation as explained, what shuld be the possible approach?在所解释的情况下，可能的方法应该是什么？

Answer 1

Try:尝试：

import pandas as pd

# Creating the dataframe
df1_field1 = ['a', 'b', 'c', 'd'] 
df1_field2 = ['a', 'b', 'c', 'd']
df1_field3 = ['a', 'b', 'c', 'd']
df1 = pd.DataFrame(
    {'field1':df1_field1,
     'field2':df1_field2,
     'field3':df1_field3,     
    })

df2_field1 = ['a', 'c', 'd', 'e'] 
df2_field2 = ['a', 'c', 'd', 'e']
df2_field3 = ['a', 'c', 'd', 'e']
df2 = pd.DataFrame(
    {'field1':df2_field1,
     'field2':df2_field2,
     'field3':df2_field3,     
    })

print(df)
print(df2)

df_all = df2.merge(df1.drop_duplicates(), on=['field1','field2','field3'], 
                   how='outer', indicator=True)
df_all[df_all['_merge'] == 'left_only']

it yields:它产生：

  field1 field2 field3
0      a      a      a
1      b      b      b
2      c      c      c
3      d      d      d
  field1 field2 field3
0      a      a      a
1      c      c      c
2      d      d      d
3      e      e      e

field1  field2  field3  _merge
3   e   e   e   left_only

As you can see, it works, and it is just an adaptation of another answer in the page you put in the description.如您所见，它有效，并且只是对您在描述中放置的页面中的另一个答案的改编。

如何将 dataframe 与另一个 df 进行比较，并使用 pandas 从第一个 df 返回新行

问题描述

1 个解决方案

解决方案1
2 2022-12-11 08:32:12

如何将 dataframe 与另一个 df 进行比较，并使用 pandas 从第一个 df 返回新行

问题描述

1 个解决方案

解决方案1 2 2022-12-11 08:32:12

解决方案1
2 2022-12-11 08:32:12