繁体   English   中英

如何处理具有NaN的Pandas Series数据类型?

[英]How do I deal with Pandas Series data type that has NaN?

在包含NaN的pandas.core.series.Series类型上使用max()和min()会发生什么? 这是错误吗? 见下文,


%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])
mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)

# Both return NaN!
print(min(mydata['Has NaN']), max(mydata['Has NaN']))
# Still why False? Isn't float('nan') a singleton like None?
print(min(mydata['Has NaN']) == max(mydata['Has NaN']))
# But this time works well!
print(min([1, 2, 3, float('nan')]))

print('\n')

# When Series data type that has NaN bumps into min() and max(), what should 
#  I do? E.g.,
try: 
    n, bins, patches = plt.hist(mydata['Has NaN'], 10)
except ValueError as e:
    print(e, '\nSeems "range" argument in hist() has problem!')

您应该使用Pandas或NumPy函数,而不要使用普通的Python函数:

In [7]: mydata['Has NaN'].min(), mydata['Has NaN'].max()
Out[7]: (-46.00309057827485, 62.430829637766671)

In [8]: min(mydata['Has NaN']), max(mydata['Has NaN'])
Out[8]: (nan, nan)

In [125]: mydata.plot.hist(alpha=0.5)
Out[125]: <matplotlib.axes._subplots.AxesSubplot at 0x1a784588>

在此处输入图片说明

首先,在处理pandasnumpy ,尤其在使用nan时,不应使用Python内置的maxmin

由于'nan'是mydata['Has NaN']的第一项,因此永远不会以maxmin代替,因为(如docs中所述):

非数字值float('NaN')和Decimal('NaN')很特殊。 它们与自己相同(x为x为真),但不等于自身(x == x为假)。 此外,将任何数字与非数字值进行比较将返回False。 例如,3 <float('NaN')和float('NaN')<3都将返回False。

而是使用pandas maxmin方法:

In [4]: mydata['Has NaN'].min()
Out[4]: -176.9844930355774

In [5]: mydata['Has NaN'].max()
Out[5]: 12.684033138603787

关于直方图,这似乎是plt.hist的已知问题,请参见此处此处

不过,现在应该很简单地处理:

n, bins, patches = plt.hist(mydata['Has NaN'][~mydata['Has NaN'].isnull()], 10)

在此处输入图片说明

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM