[英]Numpy inconsistent results with Pandas and missing values
Why does numpy
return different results with missing values when using a Pandas series compared to accessing the series' values as in the following: 与使用以下方式访问系列值相比,为什么使用Pandas系列时numpy
返回带有缺失值的不同结果:
import pandas as pd
import numpy as np
data = pd.DataFrame(dict(a=[1, 2, 3, np.nan, np.nan, 6]))
np.sum(data['a'])
#12.0
np.sum(data['a'].values)
#nan
Calling np.sum
on a pandas Series delegates to Series.sum
, which ignores NaNs when computing the sum (BY DEFAULT). 在熊猫Series上调用np.sum
代表Series.sum
,它在计算总和时(按默认值)会忽略NaN。
data['a'].sum()
# 12.0
np.sum(data['a'])
# 12.0
You can see this from the source code of np.sum
: 您可以从np.sum
的源代码中np.sum
:
np.sum??
def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue, initial=np._NoValue):
...
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
Taking a look at the source code for _wrapreduction
, we see: 看一下_wrapreduction
的源代码,我们看到:
np.core.fromnumeric._wrapreduction??
def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs):
...
if type(obj) is not mu.ndarray:
try:
reduction = getattr(obj, method) # get reference to Series.add
reduction
is then finally called at the end of the function: reduction
然后最后调用在函数的末尾:
return reduction(axis=axis, out=out, **passkwargs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.