![](/img/trans.png)
[英]How to use the pandas Series.interpolate to insert data into NAN
[英]How do I deal with Pandas Series data type that has NaN?
在包含NaN的pandas.core.series.Series类型上使用max()和min()会发生什么? 这是错误吗? 见下文,
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])
mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)
# Both return NaN!
print(min(mydata['Has NaN']), max(mydata['Has NaN']))
# Still why False? Isn't float('nan') a singleton like None?
print(min(mydata['Has NaN']) == max(mydata['Has NaN']))
# But this time works well!
print(min([1, 2, 3, float('nan')]))
print('\n')
# When Series data type that has NaN bumps into min() and max(), what should
# I do? E.g.,
try:
n, bins, patches = plt.hist(mydata['Has NaN'], 10)
except ValueError as e:
print(e, '\nSeems "range" argument in hist() has problem!')
您应该使用Pandas或NumPy函数,而不要使用普通的Python函数:
In [7]: mydata['Has NaN'].min(), mydata['Has NaN'].max()
Out[7]: (-46.00309057827485, 62.430829637766671)
In [8]: min(mydata['Has NaN']), max(mydata['Has NaN'])
Out[8]: (nan, nan)
In [125]: mydata.plot.hist(alpha=0.5)
Out[125]: <matplotlib.axes._subplots.AxesSubplot at 0x1a784588>
首先,在处理pandas
或numpy
,尤其在使用nan
时,不应使用Python内置的max
或min
。
由于'nan'是mydata['Has NaN']
的第一项,因此永远不会以max
或min
代替,因为(如docs中所述):
非数字值float('NaN')和Decimal('NaN')很特殊。 它们与自己相同(x为x为真),但不等于自身(x == x为假)。 此外,将任何数字与非数字值进行比较将返回False。 例如,3 <float('NaN')和float('NaN')<3都将返回False。
而是使用pandas
max
和min
方法:
In [4]: mydata['Has NaN'].min()
Out[4]: -176.9844930355774
In [5]: mydata['Has NaN'].max()
Out[5]: 12.684033138603787
关于直方图,这似乎是plt.hist
的已知问题,请参见此处和此处 。
不过,现在应该很简单地处理:
n, bins, patches = plt.hist(mydata['Has NaN'][~mydata['Has NaN'].isnull()], 10)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.