如何将 dataframe 中的每一行与另一个 dataframe 中的每一行进行比较，并查看值之间的差异？

Question

I have two dataframes:我有两个数据框：

df1 df1

     Code     Number
0   ABC123      1
1   DEF456      2
2   GHI789      3
3   DEA456      4

df2 df2

     Code 
0   ABD123
1   DEA458
2   GHI789

df1 acts like a dictionary, from which I can get the respective number for each item by checking their code. df1 就像一本字典，我可以通过检查它们的代码从中获取每个项目的相应编号。 There are, however, unregistered codes, and in case I find an unregistered code, I'm supposed to look for the codes that look the most like them.但是，有未注册的代码，如果我找到未注册的代码，我应该寻找看起来最像它们的代码。 So, the outcome should to be:所以，结果应该是：

ABD123 = 1 (because it has 1 different character from ABC123) ABD123 = 1（因为它与 ABC123 有 1 个不同的字符）

DEA456 = 4 (because it has 1 different character from DEA456, and 2 from DEF456, so it chooses the closest one) DEA456 = 4（因为它有1个与DEA456不同的字符，2个来自DEF456，所以它选择最接近的一个）

GHI789 = 3 (because it has an equivalent at df1) GHI789 = 3（因为它在 df1 有等价物）

I know how to check for the differences of each code individually and save the "length" of characters that differ, but I don't know how to apply this code as I don't know how to compare each row from df2 against all rows from df1.我知道如何分别检查每个代码的差异并保存不同字符的“长度”，但我不知道如何应用此代码，因为我不知道如何将 df2 中的每一行与所有行进行比较来自df1。 Is there a way?有办法吗？

Answer 1

don't know how to compare each row from df2 against all rows from df1.不知道如何将 df2 中的每一行与 df1 中的所有行进行比较。

Nested loops will work.嵌套循环将起作用。 If you had a function named compare it would look like this...如果你有一个名为compare的 function，它看起来像这样......

for index2, row2 in df2.iterrows():
    for index1, row1 in df1.iterrows():
        difference = compare(row2,row1)
        #do something with the difference.

Nested loops are usually not ideal when working with Pandas or Numpy but they do work.使用 Pandas 或 Numpy 时，嵌套循环通常并不理想，但它们确实有效。 There may be better solutions.可能有更好的解决方案。

DataFrame.iterrows() DataFrame.iterrows()

Answer 2

This should work too:这也应该有效：

df['Code_e'] = df['Code'].str.extract(r'(\d+)').astype(int)
df2['Code_e'] = df2['Code'].str.extract(r'(\d+)').astype(int)
final_df = pd.merge_asof(df2,df.sort_values(by='Code_e'),on='Code_e',suffixes=('','_right')).drop(['Code_e','Code_right'],axis=1)

如何将 dataframe 中的每一行与另一个 dataframe 中的每一行进行比较，并查看值之间的差异？

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-04-22 14:27:06

解决方案2
0 2021-04-22 16:57:45

如何将 dataframe 中的每一行与另一个 dataframe 中的每一行进行比较，并查看值之间的差异？

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-04-22 14:27:06

解决方案2 0 2021-04-22 16:57:45

解决方案1
0 已采纳 2021-04-22 14:27:06

解决方案2
0 2021-04-22 16:57:45