在算术运算中将NaN视为零？

Question

Here's a simple example of the sort of thing I'm wrestling with: 这是我正在努力解决的一个简单例子：

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: test = pd.DataFrame(np.random.randn(4,4),columns=list('ABCD'))
In [4]: for i in range(4):
  ....:    test.iloc[i,i] = np.nan

In [5]: test
Out[5]:
           A         B         C         D
0        NaN  0.136841 -0.854138 -1.890888
1  -1.261724       NaN  0.875647  1.312823
2   1.130999 -0.208402       NaN  0.256644
3  -0.158458 -0.305250  0.902756       NaN

Now, if I use sum to sum the rows, all the NaN values are treated as zeros: 现在，如果我使用sum对行求和，则所有NaN值都被视为零：

In [6]: test['Sum'] = test.loc[:,'A':'D'].sum(axis=1)

In [7]: test
Out[7]: 
          A         B         C         D       Sum
0       NaN  0.136841 -0.854138 -1.890888 -2.608185
1 -1.261724       NaN  0.875647  1.312823  0.926745
2  1.130999 -0.208402       NaN  0.256644  1.179241
3 -0.158458 -0.305250  0.902756       NaN  0.439048

But in my case, I may need to do a bit of work on the values first; 但就我而言，我可能需要先对价值观做一些工作; for example scaling them: 例如缩放它们：

In [8]: test['Sum2'] = test.A + test.B/2 - test.C/3 + test.D

In [9]: test
Out[9]: 
          A         B         C         D       Sum  Sum2
0       NaN  0.136841 -0.854138 -1.890888 -2.608185   NaN
1 -1.261724       NaN  0.875647  1.312823  0.926745   NaN
2  1.130999 -0.208402       NaN  0.256644  1.179241   NaN
3 -0.158458 -0.305250  0.902756       NaN  0.439048   NaN

As you see, the NaN values carry across into the arithmetic to produce NaN output, which is what you'd expect. 如您所见， NaN值会进入算术运算以产生NaN输出，这正是您所期望的。

Now, I don't want to replace all NaN values in my dataframe with zeros: it is helpful to me to distinguish between zero and NaN . 现在，我不想用零替换我的数据帧中的所有NaN值：我有助于区分零和NaN 。 I could replace NaN with something else: I'm dealing with large volumes of student grades, and i need to distinguish between a grade of zero, and a NaN which at the moment I'm using to indicate that the particular assessment task was not attempted. 我可以用其他东西代替NaN ：我正在处理大量的学生成绩，我需要区分零等级和NaN ，我现在用它来表示特定的评估任务不是尝试。 (It takes the place of what would be a blank cell in a traditional spreadsheet.) But whatever I replace the NaN values with, it needs to be something that can be treated as zero in the operations I may perform. （它取代了传统电子表格中的空白单元格。）但无论我用什么替换NaN值，它都需要在我可能执行的操作中被视为零。 What are my options here? 我有什么选择？

Answer 1

使用fillna功能

test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)

Answer 2

If the dataframe is not huge you can try: 如果数据帧不是很大，您可以尝试：

test["Sum"] = test.sum(axis=1)
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2

It will be interesting to know if there is a way to do the second sum in one line only. 知道是否有办法只在一行中进行第二次求和将会很有趣。

Update 更新

if you have 1e5 rows or less the method I suggested is slightly faster than the one suggested by kmcodes, then things changes. 如果你有1e5行或更少，我建议的方法比kmcodes建议的方法略快，那么事情会发生变化。

n = int(1e5)
test = pd.DataFrame(np.random.randn(n,4),columns=list('ABCD'))
for i in range(4):
    test.iloc[i,i] = np.nan

%%timeit
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2
3.95 ms ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)
4.12 ms ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Update 2 更新2

I found this 我找到了这个

In your case you can just 在你的情况下，你可以

weights = [1, 1/2, -1/3, 1]
test["Sum2"] = test.fillna(0).mul(weights).sum(axis=1)

keep in mind that this seems to be consistently slower than the other two. 请记住，这似乎始终比其他两个慢。

Answer 3

You can also concat and find the sum to get the features offered by sum() ie 您还可以连接并找到总和以获得sum()提供的功能

test['Sum2'] = pd.concat([test.A,test.B/2, test.C/(-3),test.D],1).sum(1)

       A         B         C         D      Sum2
0       NaN  0.181923 -0.526074  1.084549  1.350869
1  0.999836       NaN -0.862583 -0.473933  0.813431
2  1.043463  0.252743       NaN -0.863199  0.306635
3 -0.047286  1.432500  0.100041       NaN  0.635616

在算术运算中将NaN视为零？

问题描述

3 个解决方案

解决方案1
2 2017-12-02 09:18:38

解决方案2
1 2017-12-02 09:41:27

解决方案3
0 2017-12-02 10:43:22

在算术运算中将NaN视为零？

问题描述

3 个解决方案

解决方案1 2 2017-12-02 09:18:38

解决方案2 1 2017-12-02 09:41:27

解决方案3 0 2017-12-02 10:43:22

解决方案1
2 2017-12-02 09:18:38

解决方案2
1 2017-12-02 09:41:27

解决方案3
0 2017-12-02 10:43:22