检查 dataframe 中的行值是否存在于另一个 dataframe 中，使用循环进行协调

Question

I am looking to develop some generic logic that will allow me to perform reconciliation between 2 datasets.我希望开发一些通用逻辑，使我能够在 2 个数据集之间执行协调。

I have 2 dataframes and I want to loop through every row value in df1 and check if it exists in df2.我有 2 个数据框，我想遍历 df1 中的每一行值并检查它是否存在于 df2 中。 If it does exist I want to create a new column 'Match' in df1 with the value 'Yes' and if it does not exist I want to append the missing values in a separate df which I will print to csv.如果它确实存在，我想在 df1 中创建一个新列“匹配”，值为“是”，如果它不存在，我想 append 在单独的 df 中缺失值，我将打印到 csv。

Example datasets:示例数据集：

df1: df1：

ID   Name     Age
1    Adam     45  
2    Bill     44   
3    Claire   23

df2: df2:

ID   Name     Age
1    Adam     45 
2    Bill     44 
3    Claire   23
4    Bob      40
5    Chris    21

The column names in the 2 dataframes I've used here are just for reference.我在这里使用的 2 个数据框中的列名仅供参考。 But essentially I want to check if the row (1, Adam, 45) in df1 exists in df2.但基本上我想检查 df1 中的行 (1, Adam, 45) 是否存在于 df2 中。

The output for df3 would look like this: df3: df3 的 output 如下所示： df3:

ID   Name     Age
4    Bob      40  
5    Chris    21

The updated df1 would look like this: df2:更新后的 df1 如下所示： df2:

ID   Name     Age  Match
1    Adam     45    Yes  
2    Bill     44    Yes  
3    Claire   23    Yes

To be clear, I understand that this can be done using a merge or isin, but would like a fluid solution that can be used for any dataset.需要明确的是，我知道这可以使用合并或 isin 来完成，但希望有一个可用于任何数据集的流体解决方案。

I appreciate this might be a big ask as I haven't provided much guidline but any help with this would be great!!我很感激这可能是一个很大的问题，因为我没有提供太多指导，但任何帮助都会很棒！

Thanks!!谢谢！！

Answer 1

You need to use merge here and utilize the indicator=True feature:您需要在此处使用merge并利用indicator=True功能：

df_all = df1.merge(df2, on=['ID'], how='outer', indicator=True)
df3 = df_all[df_all['_merge'] == 'right_only'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
df2 = df_all[df_all['_merge'] == 'both'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
print(df3)
print(df2)

df3: df3:

   ID   Name  Age
3   4    Bob   40
4   5  Chris   21

df2: df2:

   ID    Name  Age
0   1    Adam   45
1   2    Bill   44
2   3  Claire   23

检查 dataframe 中的行值是否存在于另一个 dataframe 中，使用循环进行协调

问题描述

1 个解决方案

解决方案1
0 2020-05-27 23:55:37

检查 dataframe 中的行值是否存在于另一个 dataframe 中，使用循环进行协调

问题描述

1 个解决方案

解决方案1 0 2020-05-27 23:55:37

解决方案1
0 2020-05-27 23:55:37