检查numpy数组中的nan值

Question

I've read some column from an excel file and stored that in a numpy array, col. 我已经从Excel文件中读取了一些列，并将其存储在numpy数组col中。 For every index i in col I want to check if the value is nan, if it's nan I will delete the index i in col and in another array, x. 对于col中的每个索引i，我想检查该值是否为nan，如果它是nan，则将删除col和另一个数组x中的索引i。 I did this, 我做到了

workbook = xlrd.open_workbook('well data.xlsx')
sheet=workbook.sheet_by_index(0)
col= sheet.col_values(1,1)
col= np.array (col)
col= col.astype(np.float)
        for i in range (col.shape [0]):
            if (np.isnan(col[i])):
                col=np.delete(col,i)
                x= np.delete(x,i)

I'm getting two types of errors, first when this float conversion exists col= col.astype(np.float), I get 我遇到两种错误，首先是当此float转换存在col = col.astype（np.float）时，我得到了

    if (np.isnan(col[i])):
IndexError: index out of bounds

second, if I remove the float conversion, I get this error, 其次，如果删除浮点转换，则会出现此错误，

    if (np.isnan(col[i])):
TypeError: Not implemented for this type

I know for removing the nan from a single numpy array I can do this, 我知道从单个numpy数组中删除nan可以做到这一点，

x = x[numpy.logical_not(numpy.isnan(x))]

But my case is different, I want to delete the nan elements from col, and any corresponding element in x. 但是我的情况有所不同，我想从col中删除nan元素，并删除x中的任何对应元素。 For example, if index 3 in col is nan, index 3 in col and x should be deleted. 例如，如果col中的索引3为nan，则应删除col和x中的索引3。 Also, float conversion is necessary in my case. 另外，在我的情况下，需要进行浮点转换。

This is a more detailed example, 这是一个更详细的示例，

These are the initial arrays (both have similar length): 这些是初始数组（都具有相似的长度）：

col= [16.5, 14.3, 17.42,nan, 13.22, nan] col = [16.5，14.3，17.42，nan，13.22，nan]

x= [1, 2, 3, 4, 5, 6] x = [1、2、3、4、5、6]

After removing nans the arrays should be, 删除nans后，数组应该是

col= [16.5, 14.3, 17.42, 13.22] col = [16.5，14.3，17.42，13.22]

x= [1, 2, 3, 5] x = [1、2、3、5]

One more thing, the provided code works very well if I'm reading the columns from a .dat file, does it really matter if I'm reading the columns from excel? 还有一件事，如果我从.dat文件中读取列，则提供的代码会很好地工作，如果我从excel中读取列，那真的很重要吗？

Can anyone please help me solving this problem? 谁能帮我解决这个问题？

Thanks. 谢谢。

Answer 1

Your first idea was correct. 你的第一个想法是正确的。

col= col.astype(np.float)
for i in range (col.shape [0]):
    if (np.isnan(col[i])):
        col=np.delete(col,i)
        x= np.delete(x,i)

Is almost correct. 几乎是正确的。 Shape return the total length of your object, but you have to go from 0 to this length -1. Shape返回对象的总长度，但是必须从0变为该长度-1。 So your for line would be like : 因此，您的for行就像：

for i in range (0, col.shape [0]):

But since you are removing elements from the array, you may have a smaller array while computing this thing. 但是由于您要从数组中删除元素，因此在计算此内容时可能会有一个较小的数组。 So if you want to access the fifth and last element and you removed an element before, col will no longer have 5 elements. 因此，如果要访问第五个元素和最后一个元素，并且之前已删除了一个元素，则col将不再具有5个元素。 I suggest you loop backward on your coloumn, like this 我建议您像这样向后循环

for i in range(col.shape [0]-1, -1, -1):

检查numpy数组中的nan值

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-30 09:02:45

检查numpy数组中的nan值

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-30 09:02:45

解决方案1
1 已采纳 2015-06-30 09:02:45