简体   繁体   English

在算术运算中将NaN视为零?

[英]Treating NaN as zero in arithmetic operations?

Here's a simple example of the sort of thing I'm wrestling with: 这是我正在努力解决的一个简单例子:

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: test = pd.DataFrame(np.random.randn(4,4),columns=list('ABCD'))
In [4]: for i in range(4):
  ....:    test.iloc[i,i] = np.nan

In [5]: test
Out[5]:
           A         B         C         D
0        NaN  0.136841 -0.854138 -1.890888
1  -1.261724       NaN  0.875647  1.312823
2   1.130999 -0.208402       NaN  0.256644
3  -0.158458 -0.305250  0.902756       NaN 

Now, if I use sum to sum the rows, all the NaN values are treated as zeros: 现在,如果我使用sum对行求和,则所有NaN值都被视为零:

In [6]: test['Sum'] = test.loc[:,'A':'D'].sum(axis=1)

In [7]: test
Out[7]: 
          A         B         C         D       Sum
0       NaN  0.136841 -0.854138 -1.890888 -2.608185
1 -1.261724       NaN  0.875647  1.312823  0.926745
2  1.130999 -0.208402       NaN  0.256644  1.179241
3 -0.158458 -0.305250  0.902756       NaN  0.439048    

But in my case, I may need to do a bit of work on the values first; 但就我而言,我可能需要先对价值观做一些工作; for example scaling them: 例如缩放它们:

In [8]: test['Sum2'] = test.A + test.B/2 - test.C/3 + test.D

In [9]: test
Out[9]: 
          A         B         C         D       Sum  Sum2
0       NaN  0.136841 -0.854138 -1.890888 -2.608185   NaN
1 -1.261724       NaN  0.875647  1.312823  0.926745   NaN
2  1.130999 -0.208402       NaN  0.256644  1.179241   NaN
3 -0.158458 -0.305250  0.902756       NaN  0.439048   NaN

As you see, the NaN values carry across into the arithmetic to produce NaN output, which is what you'd expect. 如您所见, NaN值会进入算术运算以产生NaN输出,这正是您所期望的。

Now, I don't want to replace all NaN values in my dataframe with zeros: it is helpful to me to distinguish between zero and NaN . 现在,我不想用零替换我的数据帧中的所有NaN值:我有助于区分零和NaN I could replace NaN with something else: I'm dealing with large volumes of student grades, and i need to distinguish between a grade of zero, and a NaN which at the moment I'm using to indicate that the particular assessment task was not attempted. 我可以用其他东西代替NaN :我正在处理大量的学生成绩,我需要区分零等级和NaN ,我现在用它来表示特定的评估任务不是尝试。 (It takes the place of what would be a blank cell in a traditional spreadsheet.) But whatever I replace the NaN values with, it needs to be something that can be treated as zero in the operations I may perform. (它取代了传统电子表格中的空白单元格。)但无论我用什么替换NaN值,它都需要在我可能执行的操作中被视为零。 What are my options here? 我有什么选择?

使用fillna功能

test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)

If the dataframe is not huge you can try: 如果数据帧不是很大,您可以尝试:

test["Sum"] = test.sum(axis=1)
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2

It will be interesting to know if there is a way to do the second sum in one line only. 知道是否有办法只在一行中进行第二次求和将会很有趣。

Update 更新

if you have 1e5 rows or less the method I suggested is slightly faster than the one suggested by kmcodes, then things changes. 如果你有1e5行或更少,我建议的方法比kmcodes建议的方法略快,那么事情会发生变化。

n = int(1e5)
test = pd.DataFrame(np.random.randn(n,4),columns=list('ABCD'))
for i in range(4):
    test.iloc[i,i] = np.nan

%%timeit
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2
3.95 ms ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)
4.12 ms ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Update 2 更新2

I found this 我找到了这个

In your case you can just 在你的情况下,你可以

weights = [1, 1/2, -1/3, 1]
test["Sum2"] = test.fillna(0).mul(weights).sum(axis=1)

keep in mind that this seems to be consistently slower than the other two. 请记住,这似乎始终比其他两个慢。

You can also concat and find the sum to get the features offered by sum() ie 您还可以连接并找到总和以获得sum()提供的功能

test['Sum2'] = pd.concat([test.A,test.B/2, test.C/(-3),test.D],1).sum(1)

       A         B         C         D      Sum2
0       NaN  0.181923 -0.526074  1.084549  1.350869
1  0.999836       NaN -0.862583 -0.473933  0.813431
2  1.043463  0.252743       NaN -0.863199  0.306635
3 -0.047286  1.432500  0.100041       NaN  0.635616

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM