简体   繁体   中英

How to compare each cells of one columns to each cells of another column in a csv file with python?

I have a program that uses python pandas library to sum two columns individually and compare with 3rd column and give result. Its below:

import pandas as pd

df = pd.read_csv(r'xl1.csv', skipinitialspace=True, sep=',')
sum1 = df['Gross_Salary'].sum()
sum2 = df['Deduction'].sum()
diff = sum1 - sum2

if diff == df['Net_Salary'].sum():
    print("Pass")
else:
    print("Fail")

Its working as required. However, my requirement is to compare each cell of both columns, subtract them, and then compare with the 3rd column. If the value matches then "pass", otherwise "fail"

Below is the CSV data:

Gross_Salary Deduction Net_Salary
100             20         80
2000            200       1500
300             0          300

In the 2nd row,there is a data mismatch intentionally.

I understand that I need to use for loop to go over each row. I did try to use the loop as below

for i in pd.read_csv(r'xl1.csv', skipinitialspace=True, sep=',')

However, not able to apply the logic beyond that.

Please help,

Thank you

You can create a new column storing the test result using a vectorized implementation. Namely:

df['Result'] = ((df['Gross_Salary'] - df['Deduction']) == df['Net_Salary']).astype(int)
df['Result'] = df['Result'].map({1: 'Pass', 0: 'Fail'})

or similarly, if you also have numpy dependency:

df['Result'] = np.where(df['Gross_Salary'] - df['Deduction'] == df['Net_Salary'],
                        'Pass', 'Fail')

Pandas implementation

  • df['Gross_Salary'] - df['Deduction'] computes the elementwise difference of the two columns. Remark that pandas automatically aligns elements with the same index.
  • Once we have the difference we compare it elementwise with df['Net_Salary'] using the == operator. This will yield Series (column) with boolean values.
  • I am converting to int type so that True -> 1 and False -> 0
  • Finally I am using Series.map to encode the desired format, mapping 1 to Pass and 0 to Fail .

Numpy implementation

  • np.where returns the second (resp. the third) value depending if the condition (the first parameter) is True (resp. False).

Applying one of these to your example:

df
    Gross_Salary  Deduction  Net_Salary Result
0           100         20          80   Pass
1          2000        200        1500   Fail
2           300          0         300   Pass

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM