简体   繁体   English

如何将 dataframe 与另一个 df 进行比较,并使用 pandas 从第一个 df 返回新行

[英]How to compare a dataframe with another df and return new rows from the first df using pandas

I am trying to ascertain if values in a test dataframe ( df2 ) are not appearing in another DF ( df1 ).我试图确定测试dataframe ( df2 ) 中的值是否未出现在另一个 DF ( df1 ) 中。 The following are the two DFs:下面是两个DF:

df1 created from the following source: df1从以下来源创建:

field1字段1 field2字段2
AG股份公司 Agree同意
SA SA Somewhat Agree有点同意
DG总司 Disagree不同意
SD标清 Somewhat Disagree不太同意
NO None没有任何

df2 created from the following source: df2从以下来源创建:

field1字段1 field2字段2
CA加州 California加州
TX TX Texas得克萨斯州
NO None没有任何
NY纽约 New York纽约

Using Method 1 (see below), I am getting the expected result, which is:使用方法 1 (见下文),我得到了预期的结果,即:

Method 1方法一

diff_df = df2[~(df2[field1].isin(df1[field1]) & df2[field2].isin(df1[field2]))].reset_index(drop=True)

This gives me the folllowing expected result:这给了我以下预期结果:

  field1               field2
0     CA           California
1     TX                Texas
2     NY             New York

Note: The duplicate value in df2 ( NO: None ) gets dropped, too.注意: df2 ( NO: None ) 中的重复值也会被删除。

However, there is one problem that I am facing: There can be situations when there are different set of fields that will need to be compared (eg. there may be a third field field3 in the equation).但是,我面临一个问题:在某些情况下,需要比较一组不同的字段(例如,等式中可能有第三个字段field3 )。

From case to case basis, the number of fields would vary greatly over which the user won't have control.视具体情况而定,用户无法控制的字段数量会有很大差异

My problem: How do I modify my query so that by comparing the two dataframes I get the expected result?我的问题:如何修改我的查询以便通过比较两个数据帧得到预期的结果?

In the situation as explained, what shuld be the possible approach?在所解释的情况下,可能的方法应该是什么?

Try:尝试:

import pandas as pd

# Creating the dataframe
df1_field1 = ['a', 'b', 'c', 'd'] 
df1_field2 = ['a', 'b', 'c', 'd']
df1_field3 = ['a', 'b', 'c', 'd']
df1 = pd.DataFrame(
    {'field1':df1_field1,
     'field2':df1_field2,
     'field3':df1_field3,     
    })

df2_field1 = ['a', 'c', 'd', 'e'] 
df2_field2 = ['a', 'c', 'd', 'e']
df2_field3 = ['a', 'c', 'd', 'e']
df2 = pd.DataFrame(
    {'field1':df2_field1,
     'field2':df2_field2,
     'field3':df2_field3,     
    })

print(df)
print(df2)

df_all = df2.merge(df1.drop_duplicates(), on=['field1','field2','field3'], 
                   how='outer', indicator=True)
df_all[df_all['_merge'] == 'left_only']

it yields:它产生:

  field1 field2 field3
0      a      a      a
1      b      b      b
2      c      c      c
3      d      d      d
  field1 field2 field3
0      a      a      a
1      c      c      c
2      d      d      d
3      e      e      e

field1  field2  field3  _merge
3   e   e   e   left_only

As you can see, it works, and it is just an adaptation of another answer in the page you put in the description.如您所见,它有效,并且只是对您在描述中放置的页面中的另一个答案的改编。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 pandas 数据帧值连接到相应的行 ID 并在 ID 不在第一个 df 中时添加包含来自第二个 dataframe 的新数据的新行 - how to join pandas dataframes values to correspondant rows ID and add a new rows with new data from the second dataframe when the ID not in first df Pandas将行从1 DF移动到另一个DF - Pandas move rows from 1 DF to another DF 将新列添加到Pandas DataFrame,并用同一df的另一列填充第一个单词 - Add new column to Pandas DataFrame and fill with first word from another column from same df 对于第一个 df 中值的所有实例,如何用另一个 dataframe 的值替换 pandas 行? - How do I replace pandas rows with values of another dataframe for all instances of the value in the first df? Pandas.DataFrame:创建一个新列,使用当前df中的一列并在另一个df中查找一列,并进行计算 - Pandas.DataFrame: Create a new column, using one column from current df and by looking up one column in another df, with calculation 如何将一个数据帧与另一个数据帧进行比较并检查第二个 df 中是否存在第一个 df 中的相同数据 - How to compare one dataframe with another and check whether the same data in first df present in second df Pandas Dataframe:df 从另一个 df1 dataframe 添加列 - Pandas Dataframe: df adding a column from another df1 dataframe 用另一个 df 中的行替换 Pandas df 中的行 - Replace rows in a Pandas df with rows from another df 使用 groupby 列出 Pandas df 的前 N 行 - list first N rows from Pandas df using groupby 你能帮我从 pandas df 中找到一行内容到另一个 df 中,然后将发现的计数添加到第一个 df 的新列中吗? - Can you help me finding a row content from a pandas df into another df and then add the count of the findings into a new column of the first df?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM