Python 计算两列的不同值

Question

If i have a dataframe:如果我有 dataframe：

A                         B         C
0.0285714285714285        4         0.11428571
0.107142857142857         4         0.42857143
0.007142857142857         6         0.04285714
1.2                       4         5.5
1.5                       3         3

Desired output is;所需的 output 是；

A*B                    C                       Difference            
0.114285714285714‬      0.11428571          0.000000004285714‬            
0.428571428571428‬      0.42857143         -0.000000001428572‬
0.042857142857142‬      0.04285714          0.000000002857142‬
4.8                    5.5                        -0.7
4.5                    3                           1.5

Count: 2

I want to ignore the like 3 rows, because the difference is very small.我想忽略like 3行，因为差异很小。 only the first digit after the comma should be included.仅应包含逗号后的第一个数字。

Could you please help me about this?你能帮我解决这个问题吗？

Answer 1

Using np.where to check whether the result is significant enough:使用np.where检查结果是否足够显着：

df["difference"] = np.where((df["A"]*df["B"]-df["C"]>=0.1)|(df["A"]*df["B"]-df["C"]<=-0.1),df["A"]*df["B"]-df["C"],0)

print (df)

#
          A  B         C  difference
0  0.028571  4  0.114286         0.0
1  0.107143  4  0.428571         0.0
2  0.007143  6  0.042857         0.0
3  1.200000  4  5.500000        -0.7
4  1.500000  3  3.000000         1.5

Answer 2

EDIT:编辑：

Because values in column A are objects (obviously strings):因为A列中的值是对象（显然是字符串）：

df['A'] = df['A'].astype(float)

If not working, because bad values (eg some strings) - bad values are repalced by NaN s:如果不工作，因为坏值（例如一些字符串） - 坏值被NaN替换：

df['A'] = pd.to_numeric(df['A'], errors='coerce')

Use Series.mask for set new column by condition with Series.between :使用Series.mask按条件与Series.between设置新列：

#multiple columns
df['A*B'] = df["A"]*df["B"]
#subtract to Series
diff = df['A*B'] - df['C']
#create mask
mask = diff.between(-0.1, 0.1)

df["difference"] = diff.mask(mask, 0)
print (df)
          A  B         C       A*B  difference
0  0.028571  4  0.114286  0.114286         0.0
1  0.107143  4  0.428571  0.428571         0.0
2  0.007143  6  0.042857  0.042857         0.0
3  1.200000  4  5.500000  4.800000        -0.7
4  1.500000  3  3.000000  4.500000         1.5

print (f'Count: {(~mask).sum()}')
Count: 2

If order is important add DataFrame.insert with DataFrame.pop for extract columns:如果顺序很重要，请添加DataFrame.insert和DataFrame.pop用于提取列：

df.insert(0, 'A*B',  df.pop("A")*df.pop("B"))
diff = df['A*B'] - df['C']
mask = diff.between(-0.1, 0.1)

df["difference"] = diff.mask(mask, 0)
print (df)
        A*B         C  difference
0  0.114286  0.114286         0.0
1  0.428571  0.428571         0.0
2  0.042857  0.042857         0.0
3  4.800000  5.500000        -0.7
4  4.500000  3.000000         1.5


print (f'Count: {(~mask).sum()}')
Count: 2

Python 计算两列的不同值

问题描述

2 个解决方案

解决方案1
1 2019-10-22 07:22:18

解决方案2
1 已采纳 2019-10-22 07:25:14

Python 计算两列的不同值

问题描述

2 个解决方案

解决方案1 1 2019-10-22 07:22:18

解决方案2 1 已采纳 2019-10-22 07:25:14

解决方案1
1 2019-10-22 07:22:18

解决方案2
1 已采纳 2019-10-22 07:25:14