简体   繁体   English

将两个数据框的列与公差进行比较

[英]Compare two dataframes' columns with tolerances

I have two dataframes and I want to compare them row by row, and storing the results in a new one.我有两个数据框,我想逐行比较它们,并将结果存储在一个新的中。

What I want is doing conditional comparison, If the value in df1 is different from the value in df2 (or more than the tolerance), we want to copy the values in df2 in the new dataframe.我想要的是进行条件比较,如果 df1 中的值与 df2 中的值不同(或大于公差),我们希望将 df2 中的值复制到新的 dataframe 中。 And if the values are same (or within the tolerance), return null in the new dataframe.如果值相同(或在公差范围内),则在新的 dataframe 中返回 null。

For each column, i would add a tolerance: tolerance for age: 3, tolerance for salary: 500, and tolerance for bonus: 100对于每一列,我会添加一个容差:年龄容差:3,工资容差:500,奖金容差:100

df1 = pd.DataFrame({'Age':[22,55,35],'salary':[1500,2000,1000],'bouns':[500,222,124]}) 
df2 = pd.DataFrame({'Age':[23,55,65],'salary':[1400,1000,3000 ],'bouns':[100,222,500]})

In [3]: df1 

   Age  salary   bonus
0  22    1500    500
1  55    2000    222
2  35    1000    124

In [3]: df2

   Age  salary   bonus
0   23   1400    100
1   55   1000    222
2   65   3000    500

The output should be like this: output 应该是这样的:

In [4]: df3

   Age  salary   bonus
0                100
1        1000   
2   65   3000    500

What I tried to is I used isclose function to compare the values from both dataframes with a specific tolerance.我尝试的是使用 isclose function 来比较两个数据帧的值与特定容差。 and it works ok.它工作正常。 but it returned boolean true or false.但它返回 boolean 真或假。

df3 = np.isclose(df1["Age"], df2["Age"], atol=3)

I want to use if-else statement, so if the condition is True, return null, and if it is False, return the value in df2.我想用if-else语句,所以如果条件为True,返回null,如果为False,返回df2中的值。

You could subtract both dataframes and take the absolute value, and use a list with [3,500,100] to see colmn-wise which values are above the tolerances.您可以减去两个数据帧并取绝对值,并使用带有[3,500,100]的列表来逐列查看哪些值高于公差。 Then DataFrame.where to replace the values where the condition is False to an empty string:然后DataFrame.where将条件为False的值替换为空字符串:

df2.where(df2.sub(df1).abs().gt([3,500,100]), '')

  Age salary bouns
0              100
1       1000      
2  65   3000   500

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM