简体   繁体   English

如何将 dataframe 中的每一行与另一个 dataframe 中的每一行进行比较,并查看值之间的差异?

[英]How can I compare each row from a dataframe against every row from another dataframe and see the difference between values?

I have two dataframes:我有两个数据框:

df1 df1

     Code     Number
0   ABC123      1
1   DEF456      2
2   GHI789      3
3   DEA456      4

df2 df2

     Code 
0   ABD123
1   DEA458
2   GHI789

df1 acts like a dictionary, from which I can get the respective number for each item by checking their code. df1 就像一本字典,我可以通过检查它们的代码从中获取每个项目的相应编号。 There are, however, unregistered codes, and in case I find an unregistered code, I'm supposed to look for the codes that look the most like them.但是,有未注册的代码,如果我找到未注册的代码,我应该寻找看起来最像它们的代码。 So, the outcome should to be:所以,结果应该是:

ABD123 = 1 (because it has 1 different character from ABC123) ABD123 = 1(因为它与 ABC123 有 1 个不同的字符)

DEA456 = 4 (because it has 1 different character from DEA456, and 2 from DEF456, so it chooses the closest one) DEA456 = 4(因为它有1个与DEA456不同的字符,2个来自DEF456,所以它选择最接近的一个)

GHI789 = 3 (because it has an equivalent at df1) GHI789 = 3(因为它在 df1 有等价物)

I know how to check for the differences of each code individually and save the "length" of characters that differ, but I don't know how to apply this code as I don't know how to compare each row from df2 against all rows from df1.我知道如何分别检查每个代码的差异并保存不同字符的“长度”,但我不知道如何应用此代码,因为我不知道如何将 df2 中的每一行与所有行进行比较来自df1。 Is there a way?有办法吗?

don't know how to compare each row from df2 against all rows from df1.不知道如何将 df2 中的每一行与 df1 中的所有行进行比较。

Nested loops will work.嵌套循环将起作用。 If you had a function named compare it would look like this...如果你有一个名为compare的 function,它看起来像这样......

for index2, row2 in df2.iterrows():
    for index1, row1 in df1.iterrows():
        difference = compare(row2,row1)
        #do something with the difference.

Nested loops are usually not ideal when working with Pandas or Numpy but they do work.使用 Pandas 或 Numpy 时,嵌套循环通常并不理想,但它们确实有效。 There may be better solutions.可能有更好的解决方案。


DataFrame.iterrows() DataFrame.iterrows()

This should work too:这也应该有效:

df['Code_e'] = df['Code'].str.extract(r'(\d+)').astype(int)
df2['Code_e'] = df2['Code'].str.extract(r'(\d+)').astype(int)
final_df = pd.merge_asof(df2,df.sort_values(by='Code_e'),on='Code_e',suffixes=('','_right')).drop(['Code_e','Code_right'],axis=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将一个数据框的每一行与另一数据框的所有行进行比较,并计算距离度量? - How to compare each row from one dataframe against all the rows from other dataframe and calculate distance measure? 如果匹配来自同一行列的复制值,则比较并循环第二个数据帧中时间间隔的每一行 - Compare and loop every row of in between time in a second dataframe, if match copy values from columns of the same row 如何将 dataframe 中的每一列与另一个 dataframe pandas 的行相乘? - How to multiply each column in a dataframe with a row from another dataframe pandas? 从数据框中每一行的两个其他值之间获取值 - Get values from between two other values for each row in the dataframe 如何根据值之间的差异将值从一个 dataframe 列复制到另一列 - How can I copy values from one dataframe column to another based on the difference between the values 如何遍历数据框中的行,并为每一行剪切每 3 个值并垂直堆叠这些值? - How can I iterate over rows in a dataframe, and for each row, cut every 3 values and stack the values vertically? 如果 dataframe 的每一行中的两个值被氨基酸隔开,如何测量它们之间的差异? - How do I measure the difference between two values within each row of my dataframe if they are separated by amino acids? 将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较 - Compare each row in one dataframe to each row in another dataframe in Python 如何从 Pandas 数据框中的每一行获取前三个最大值? - How can I get the first three max values from each row in a Pandas dataframe? 如何计算行的列值与 dataframe 中具有多个值的所有其他行的差异? 迭代每一行 - How to calculate the difference of a row's column values against all other rows with multiple values in a dataframe? Iterate for every row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM