Check if row value in a dataframe exists in another dataframe using loop for reconciliation

Question

I am looking to develop some generic logic that will allow me to perform reconciliation between 2 datasets.

I have 2 dataframes and I want to loop through every row value in df1 and check if it exists in df2. If it does exist I want to create a new column 'Match' in df1 with the value 'Yes' and if it does not exist I want to append the missing values in a separate df which I will print to csv.

Example datasets:

df1:

ID   Name     Age
1    Adam     45  
2    Bill     44   
3    Claire   23

df2:

ID   Name     Age
1    Adam     45 
2    Bill     44 
3    Claire   23
4    Bob      40
5    Chris    21

The column names in the 2 dataframes I've used here are just for reference. But essentially I want to check if the row (1, Adam, 45) in df1 exists in df2.

The output for df3 would look like this: df3:

ID   Name     Age
4    Bob      40  
5    Chris    21

The updated df1 would look like this: df2:

ID   Name     Age  Match
1    Adam     45    Yes  
2    Bill     44    Yes  
3    Claire   23    Yes

To be clear, I understand that this can be done using a merge or isin, but would like a fluid solution that can be used for any dataset.

I appreciate this might be a big ask as I haven't provided much guidline but any help with this would be great!!

Thanks!!

Answer 1

You need to use merge here and utilize the indicator=True feature:

df_all = df1.merge(df2, on=['ID'], how='outer', indicator=True)
df3 = df_all[df_all['_merge'] == 'right_only'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
df2 = df_all[df_all['_merge'] == 'both'].drop(columns=['Name_x', 'Age_x']).rename(columns={'Name_y': 'Name', 'Age_y': 'Age'})[['ID', 'Name', 'Age']]
print(df3)
print(df2)

df3:

   ID   Name  Age
3   4    Bob   40
4   5  Chris   21

df2:

   ID    Name  Age
0   1    Adam   45
1   2    Bill   44
2   3  Claire   23

Check if row value in a dataframe exists in another dataframe using loop for reconciliation

Question

1 answers

solution1
0 2020-05-27 23:55:37

Check if row value in a dataframe exists in another dataframe using loop for reconciliation

Question

1 answers

solution1 0 2020-05-27 23:55:37

solution1
0 2020-05-27 23:55:37