简体   繁体   中英

Compare two dataframes' columns with tolerances

I have two dataframes and I want to compare them row by row, and storing the results in a new one.

What I want is doing conditional comparison, If the value in df1 is different from the value in df2 (or more than the tolerance), we want to copy the values in df2 in the new dataframe. And if the values are same (or within the tolerance), return null in the new dataframe.

For each column, i would add a tolerance: tolerance for age: 3, tolerance for salary: 500, and tolerance for bonus: 100

df1 = pd.DataFrame({'Age':[22,55,35],'salary':[1500,2000,1000],'bouns':[500,222,124]}) 
df2 = pd.DataFrame({'Age':[23,55,65],'salary':[1400,1000,3000 ],'bouns':[100,222,500]})

In [3]: df1 

   Age  salary   bonus
0  22    1500    500
1  55    2000    222
2  35    1000    124

In [3]: df2

   Age  salary   bonus
0   23   1400    100
1   55   1000    222
2   65   3000    500

The output should be like this:

In [4]: df3

   Age  salary   bonus
0                100
1        1000   
2   65   3000    500

What I tried to is I used isclose function to compare the values from both dataframes with a specific tolerance. and it works ok. but it returned boolean true or false.

df3 = np.isclose(df1["Age"], df2["Age"], atol=3)

I want to use if-else statement, so if the condition is True, return null, and if it is False, return the value in df2.

You could subtract both dataframes and take the absolute value, and use a list with [3,500,100] to see colmn-wise which values are above the tolerances. Then DataFrame.where to replace the values where the condition is False to an empty string:

df2.where(df2.sub(df1).abs().gt([3,500,100]), '')

  Age salary bouns
0              100
1       1000      
2  65   3000   500

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM