[英]Check if row value in a dataframe exists in another dataframe using loop for reconciliation
I am looking to develop some generic logic that will allow me to perform reconciliation between 2 datasets.我希望开发一些通用逻辑,使我能够在 2 个数据集之间执行协调。
I have 2 dataframes and I want to loop through every row value in df1 and check if it exists in df2.我有 2 个数据框,我想遍历 df1 中的每一行值并检查它是否存在于 df2 中。 If it does exist I want to create a new column 'Match' in df1 with the value 'Yes' and if it does not exist I want to append the missing values in a separate df which I will print to csv.
如果它确实存在,我想在 df1 中创建一个新列“匹配”,值为“是”,如果它不存在,我想 append 在单独的 df 中缺失值,我将打印到 csv。
Example datasets:示例数据集:
df1: df1:
ID Name Age
1 Adam 45
2 Bill 44
3 Claire 23
df2: df2:
ID Name Age
1 Adam 45
2 Bill 44
3 Claire 23
4 Bob 40
5 Chris 21
The column names in the 2 dataframes I've used here are just for reference.我在这里使用的 2 个数据框中的列名仅供参考。 But essentially I want to check if the row (1, Adam, 45) in df1 exists in df2.
但基本上我想检查 df1 中的行 (1, Adam, 45) 是否存在于 df2 中。
The output for df3 would look like this: df3: df3 的 output 如下所示: df3:
ID Name Age
4 Bob 40
5 Chris 21
The updated df1 would look like this: df2:更新后的 df1 如下所示: df2:
ID Name Age Match
1 Adam 45 Yes
2 Bill 44 Yes
3 Claire 23 Yes
To be clear, I understand that this can be done using a merge or isin, but would like a fluid solution that can be used for any dataset.需要明确的是,我知道这可以使用合并或 isin 来完成,但希望有一个可用于任何数据集的流体解决方案。
I appreciate this might be a big ask as I haven't provided much guidline but any help with this would be great!!我很感激这可能是一个很大的问题,因为我没有提供太多指导,但任何帮助都会很棒!
Thanks!!谢谢!!
You need to use merge
here and utilize the indicator=True
feature:您需要在此处使用
merge
并利用indicator=True
功能:
df_all = df1.merge(df2, on=['ID'], how='outer', indicator=True)
df3 = df_all[df_all['_merge'] == 'right_only'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
df2 = df_all[df_all['_merge'] == 'both'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
print(df3)
print(df2)
df3: df3:
ID Name Age
3 4 Bob 40
4 5 Chris 21
df2: df2:
ID Name Age
0 1 Adam 45
1 2 Bill 44
2 3 Claire 23
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.