简体   繁体   English

Numpy isnan() 在浮点数组上失败(来自 Pandas 数据帧应用)

[英]Numpy isnan() fails on an array of floats (from pandas dataframe apply)

I have an array of floats (some normal numbers, some nans) that is coming out of an apply on a pandas dataframe.我有一组浮点数(一些正常数字,一些 nans),它们来自对 Pandas 数据框的应用。

For some reason, numpy.isnan is failing on this array, however as shown below, each element is a float, numpy.isnan runs correctly on each element, the type of the variable is definitely a numpy array.出于某种原因,numpy.isnan 在这个数组上失败了,但是如下所示,每个元素都是一个浮点数,numpy.isnan 在每个元素上正确运行,变量的类型肯定是一个 numpy 数组。

What's going on?!这是怎么回事?!

set([type(x) for x in tester])
Out[59]: {float}

tester
Out[60]: 
array([-0.7000000000000001, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan], dtype=object)

set([type(x) for x in tester])
Out[61]: {float}

np.isnan(tester)
Traceback (most recent call last):

File "<ipython-input-62-e3638605b43c>", line 1, in <module>
np.isnan(tester)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

set([np.isnan(x) for x in tester])
Out[65]: {False, True}

type(tester)
Out[66]: numpy.ndarray

np.isnan can be applied to NumPy arrays of native dtype (such as np.float64): np.isnan可以应用于原生数据类型的 NumPy 数组(例如 np.float64):

In [99]: np.isnan(np.array([np.nan, 0], dtype=np.float64))
Out[99]: array([ True, False], dtype=bool)

but raises TypeError when applied to object arrays:但是在应用于对象数组时会引发 TypeError:

In [96]: np.isnan(np.array([np.nan, 0], dtype=object))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Since you have Pandas, you could use pd.isnull instead -- it can accept NumPy arrays of object or native dtypes:由于您有 Pandas,您可以使用pd.isnull代替——它可以接受对象或本机 dtypes 的 NumPy 数组:

In [97]: pd.isnull(np.array([np.nan, 0], dtype=float))
Out[97]: array([ True, False], dtype=bool)

In [98]: pd.isnull(np.array([np.nan, 0], dtype=object))
Out[98]: array([ True, False], dtype=bool)

Note that None is also considered a null value in object arrays.请注意, None也被视为对象数组中的空值。

A great substitute for np.isnan() and pd.isnull() is np.isnan() 和 pd.isnull() 的一个很好的替代品是

for i in range(0,a.shape[0]):
    if(a[i]!=a[i]):
       //do something here
       //a[i] is nan

since only nan is not equal to itself.因为只有 nan 不等于它自己。

On top of @unutbu answer, you could coerce pandas numpy object array to native (float64) type, something along the line在@unutbu 答案之上,您可以将 Pandas numpy 对象数组强制转换为本机 (float64) 类型,沿着这条线

import pandas as pd
pd.to_numeric(df['tester'], errors='coerce')

Specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN.指定 errors='coerce' 强制无法解析为数值的字符串变为 NaN。 Column type would be dtype: float64 , and then isnan check should work列类型将是dtype: float64 ,然后isnan检查应该可以工作

Make sure you import csv file using Pandas确保使用 Pandas 导入 csv 文件

import pandas as pd

condition = pd.isnull(data[i][j])

Just answer this for a reminder of myself.只是为了提醒自己而回答这个问题。 It took me a whole day to solve.我花了一整天才解决。 After digging deep into the code, I found that in _encodepy.py :在深入研究代码后,我在_encodepy.py发现:

if values.dtype.kind in 'UO':
    # correct branch
else
    # wrong branch, if in this branch whatever data you give it will produce the error
    if np.isnan(known_values).any(): # here is problematic line

so the solution is very simple, just astype your data with np.object所以解决的方法很简单,只需astype与您的数据np.object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM