如何使用python将一列的每个单元格与csv文件中另一列的每个单元格进行比较？

Question

I have a program that uses python pandas library to sum two columns individually and compare with 3rd column and give result.我有一个程序使用 python pandas 库分别对两列求和并与第三列进行比较并给出结果。 Its below:它在下面：

import pandas as pd

df = pd.read_csv(r'xl1.csv', skipinitialspace=True, sep=',')
sum1 = df['Gross_Salary'].sum()
sum2 = df['Deduction'].sum()
diff = sum1 - sum2

if diff == df['Net_Salary'].sum():
    print("Pass")
else:
    print("Fail")

Its working as required.它按要求工作。 However, my requirement is to compare each cell of both columns, subtract them, and then compare with the 3rd column.但是，我的要求是比较两列的每个单元格，减去它们，然后与第三列进行比较。 If the value matches then "pass", otherwise "fail"如果值匹配则“通过”，否则“失败”

Below is the CSV data:以下是 CSV 数据：

Gross_Salary Deduction Net_Salary
100             20         80
2000            200       1500
300             0          300

In the 2nd row,there is a data mismatch intentionally.在第 2 行，故意存在数据不匹配。

I understand that I need to use for loop to go over each row.我知道我需要使用 for 循环来遍历每一行。 I did try to use the loop as below我确实尝试使用如下循环

for i in pd.read_csv(r'xl1.csv', skipinitialspace=True, sep=',')

However, not able to apply the logic beyond that.但是，无法应用除此之外的逻辑。

Please help,请帮忙，

Thank you谢谢

Answer 1

You can create a new column storing the test result using a vectorized implementation.您可以使用矢量化实现创建一个存储测试结果的新列。 Namely:即：

df['Result'] = ((df['Gross_Salary'] - df['Deduction']) == df['Net_Salary']).astype(int)
df['Result'] = df['Result'].map({1: 'Pass', 0: 'Fail'})

or similarly, if you also have numpy dependency:或者类似地，如果你也有 numpy 依赖：

df['Result'] = np.where(df['Gross_Salary'] - df['Deduction'] == df['Net_Salary'],
                        'Pass', 'Fail')

Pandas implementation熊猫实现

df['Gross_Salary'] - df['Deduction'] computes the elementwise difference of the two columns. df['Gross_Salary'] - df['Deduction']计算两列的元素差异。 Remark that pandas automatically aligns elements with the same index.请注意，pandas 会自动将具有相同索引的元素对齐。
Once we have the difference we compare it elementwise with df['Net_Salary'] using the == operator.一旦我们有了差异，我们就使用==运算符将其与df['Net_Salary']进行比较。 This will yield Series (column) with boolean values.这将产生具有布尔值的系列（列）。
I am converting to int type so that True -> 1 and False -> 0我正在转换为int类型，以便True -> 1和False -> 0
Finally I am using Series.map to encode the desired format, mapping 1 to Pass and 0 to Fail .最后，我使用Series.map对所需的格式进行编码，将 1 映射到Pass并将 0 映射到Fail 。

Numpy implementation Numpy 实现

np.where returns the second (resp. the third) value depending if the condition (the first parameter) is True (resp. False). np.where返回第二个（相应的第三个）值，具体取决于条件（第一个参数）是否为 True（相应的 False）。

Applying one of these to your example:将其中之一应用于您的示例：

df
    Gross_Salary  Deduction  Net_Salary Result
0           100         20          80   Pass
1          2000        200        1500   Fail
2           300          0         300   Pass

如何使用python将一列的每个单元格与csv文件中另一列的每个单元格进行比较？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-20 08:55:06

如何使用python将一列的每个单元格与csv文件中另一列的每个单元格进行比较？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-20 08:55:06

解决方案1
2 已采纳 2020-03-20 08:55:06