简体   繁体   English

检查numpy数组中的nan值

[英]Checking nan values in a numpy array

I've read some column from an excel file and stored that in a numpy array, col. 我已经从Excel文件中读取了一些列,并将其存储在numpy数组col中。 For every index i in col I want to check if the value is nan, if it's nan I will delete the index i in col and in another array, x. 对于col中的每个索引i,我想检查该值是否为nan,如果它是nan,则将删除col和另一个数组x中的索引i。 I did this, 我做到了

workbook = xlrd.open_workbook('well data.xlsx')
sheet=workbook.sheet_by_index(0)
col= sheet.col_values(1,1)
col= np.array (col)
col= col.astype(np.float)
        for i in range (col.shape [0]):
            if (np.isnan(col[i])):
                col=np.delete(col,i)
                x= np.delete(x,i)

I'm getting two types of errors, first when this float conversion exists col= col.astype(np.float), I get 我遇到两种错误,首先是当此float转换存在col = col.astype(np.float)时,我得到了

    if (np.isnan(col[i])):
IndexError: index out of bounds

second, if I remove the float conversion, I get this error, 其次,如果删除浮点转换,则会出现此错误,

    if (np.isnan(col[i])):
TypeError: Not implemented for this type

I know for removing the nan from a single numpy array I can do this, 我知道从单个numpy数组中删除nan可以做到这一点,

x = x[numpy.logical_not(numpy.isnan(x))]

But my case is different, I want to delete the nan elements from col, and any corresponding element in x. 但是我的情况有所不同,我想从col中删除nan元素,并删除x中的任何对应元素。 For example, if index 3 in col is nan, index 3 in col and x should be deleted. 例如,如果col中的索引3为nan,则应删除col和x中的索引3。 Also, float conversion is necessary in my case. 另外,在我的情况下,需要进行浮点转换。

This is a more detailed example, 这是一个更详细的示例,

These are the initial arrays (both have similar length): 这些是初始数组(都具有相似的长度):

col= [16.5, 14.3, 17.42,nan, 13.22, nan] col = [16.5,14.3,17.42,nan,13.22,nan]

x= [1, 2, 3, 4, 5, 6] x = [1、2、3、4、5、6]

After removing nans the arrays should be, 删除nans后,数组应该是

col= [16.5, 14.3, 17.42, 13.22] col = [16.5,14.3,17.42,13.22]

x= [1, 2, 3, 5] x = [1、2、3、5]

One more thing, the provided code works very well if I'm reading the columns from a .dat file, does it really matter if I'm reading the columns from excel? 还有一件事,如果我从.dat文件中读取列,则提供的代码会很好地工作,如果我从excel中读取列,那真的很重要吗?

Can anyone please help me solving this problem? 谁能帮我解决这个问题?

Thanks. 谢谢。

Your first idea was correct. 你的第一个想法是正确的。

col= col.astype(np.float)
for i in range (col.shape [0]):
    if (np.isnan(col[i])):
        col=np.delete(col,i)
        x= np.delete(x,i)

Is almost correct. 几乎是正确的。 Shape return the total length of your object, but you have to go from 0 to this length -1. Shape返回对象的总长度,但是必须从0变为该长度-1。 So your for line would be like : 因此,您的for行就像:

for i in range (0, col.shape [0]):

But since you are removing elements from the array, you may have a smaller array while computing this thing. 但是由于您要从数组中删除元素,因此在计算此内容时可能会有一个较小的数组。 So if you want to access the fifth and last element and you removed an element before, col will no longer have 5 elements. 因此,如果要访问第五个元素和最后一个元素,并且之前已删除了一个元素,则col将不再具有5个元素。 I suggest you loop backward on your coloumn, like this 我建议您像这样向后循环

for i in range(col.shape [0]-1, -1, -1):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM