[英]How can I compare each row from a dataframe against every row from another dataframe and see the difference between values?
I have two dataframes:我有两个数据框:
df1 df1
Code Number
0 ABC123 1
1 DEF456 2
2 GHI789 3
3 DEA456 4
df2 df2
Code
0 ABD123
1 DEA458
2 GHI789
df1 acts like a dictionary, from which I can get the respective number for each item by checking their code. df1 就像一本字典,我可以通过检查它们的代码从中获取每个项目的相应编号。 There are, however, unregistered codes, and in case I find an unregistered code, I'm supposed to look for the codes that look the most like them.但是,有未注册的代码,如果我找到未注册的代码,我应该寻找看起来最像它们的代码。 So, the outcome should to be:所以,结果应该是:
ABD123 = 1 (because it has 1 different character from ABC123) ABD123 = 1(因为它与 ABC123 有 1 个不同的字符)
DEA456 = 4 (because it has 1 different character from DEA456, and 2 from DEF456, so it chooses the closest one) DEA456 = 4(因为它有1个与DEA456不同的字符,2个来自DEF456,所以它选择最接近的一个)
GHI789 = 3 (because it has an equivalent at df1) GHI789 = 3(因为它在 df1 有等价物)
I know how to check for the differences of each code individually and save the "length" of characters that differ, but I don't know how to apply this code as I don't know how to compare each row from df2 against all rows from df1.我知道如何分别检查每个代码的差异并保存不同字符的“长度”,但我不知道如何应用此代码,因为我不知道如何将 df2 中的每一行与所有行进行比较来自df1。 Is there a way?有办法吗?
don't know how to compare each row from df2 against all rows from df1.不知道如何将 df2 中的每一行与 df1 中的所有行进行比较。
Nested loops will work.嵌套循环将起作用。 If you had a function named compare
it would look like this...如果你有一个名为compare
的 function,它看起来像这样......
for index2, row2 in df2.iterrows():
for index1, row1 in df1.iterrows():
difference = compare(row2,row1)
#do something with the difference.
Nested loops are usually not ideal when working with Pandas or Numpy but they do work.使用 Pandas 或 Numpy 时,嵌套循环通常并不理想,但它们确实有效。 There may be better solutions.可能有更好的解决方案。
This should work too:这也应该有效:
df['Code_e'] = df['Code'].str.extract(r'(\d+)').astype(int)
df2['Code_e'] = df2['Code'].str.extract(r'(\d+)').astype(int)
final_df = pd.merge_asof(df2,df.sort_values(by='Code_e'),on='Code_e',suffixes=('','_right')).drop(['Code_e','Code_right'],axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.