简体   繁体   English

从具有字符串值和数字值的numpy数组中删除NaN

[英]Remove NaNs from numpy array that has string values and numerical values

I have a (M x N) numpy array, which contains string values, numerical values and nans. 我有一个(M x N) numpy数组,其中包含字符串值,数值和nans。 I want to drop the rows which contain NaN values. 我想删除包含NaN值的行。 I've tried: 我试过了:

arr[~np.isnan(arr)]

however i get the error: 但是我得到了错误:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs 
could not be safely coerced to any supported types according to the casting rule ''save''

Solution that I used: 我使用的解决方案:

# get column with NaNs, find row index, store in list
nan_idx = []
for v,m in enumerate(arr[:,row]):
    if np.isnan(m):
        nan_idx.append(v)

# separate columns with strings and non strings
numeric_cols = arr[:,:some_idx]
non_numeric_cols = arr[:,other_idx:]

# remove the nans
numeric_cols = numeric_cols[~np.isnan(numeric_cols).any(axis=1)]
non_numeric_cols = np.delete(non_numeric_cols, nan_idx, 0)

I get your error if I make an object dtype array: 如果创建对象dtype数组,则会收到您的错误消息:

In [112]: arr=np.ones((3,2),object)
In [113]: arr
Out[113]: 
array([[1, 1],
       [1, 1],
       [1, 1]], dtype=object)
In [114]: np.isnan(arr)
...
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

That dtype is the only one that can mix numbers, strings and np.nan (which is a float). dtype是唯一可以混合数字,字符串和np.nan (浮点数)的np.nan You can't do a lot of whole-array operations with this. 您不能使用此方法执行很多全数组操作。

I can't readily test your solution because several variables are unknown. 我无法轻易测试您的解决方案,因为几个变量是未知的。

With a more general arr , I don't see how you can remove a row without iterating over both rows and cols, testing whether each value is numeric, and if numeric isnan . 对于更通用的arr ,我看不到如何在isnan和列的情况下删除行,测试每个值是否为数字以及是否为isnan np.isnan is picky and can only operate on a float. np.isnan很挑剔,只能在浮动时运行。

As mentioned in the 'possible duplicate' pandas isnull is more general. 正如“可能重复”中提到的那样,大熊猫isnull更普遍。

So basically two points: 所以基本上有两点:

  • what's a good general test that can handle strings as well as numbers 什么是可以处理字符串和数字的好的通用测试?

  • can you get around a full iteration, assuming the array is dtype object. 假设数组是dtype对象,您能否解决整个迭代问题?

np.isnan on arrays of dtype "object" My solution here is to do a list comprehension to loop over a 1d array. dtype“对象”数组上的np.isnan我的解决方案是对列表进行遍历以遍历1d数组。

From that I can test each element of arr with: 由此,我可以使用以下命令测试arr每个元素:

In [125]: arr
Out[125]: 
array([['str', 1],
       [nan, 'str'],
       [1, 1]], dtype=object)
In [136]: for row in arr:
     ...:     for col in row:
     ...:         print(np.can_cast(col,float) and np.isnan(col))
False
False
True
False
False
False

One solution is you can use np.sum() to sum each row up. 一种解决方案是您可以使用np.sum()对每一行求和。 because nan + any float = nan, so that you can get which lines incluede nan value. 因为nan + any float = nan,所以您可以获得包含nan值的行。

np.sum(arr,axis = 1)
rowsWithoutNaN = [ not(np.isnan(i)) for i in b]
result = np.array( [val for shouldKeep, val in zip(rowsWithoutNaN,arr) if shouldKeep])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM