如何在数据立方体中搜索具有除 NaN 以外的值的最近邻点？

Question

I am working with a datacube such as data[x,y,z].我正在使用数据立方体，例如 data[x,y,z]。 Each point is a velocity through time, and the [x,y] grid corresponds to coordinates.每个点都是时间的速度，[x,y] 网格对应坐标。 If I pick a point of coordinates x and y, it is likely that the timeseries is incomplete (with some NaNs).如果我选择坐标 x 和 y 的点，则时间序列很可能不完整（带有一些 NaN）。 I created a function which searches for the closest neighbor with a value, and replaces the NaN of my xy point with it.我创建了一个 function，它用一个值搜索最近的邻居，并用它替换我的 xy 点的 NaN。 However I want to know if there is a more efficient way to code something which does the same?但是我想知道是否有更有效的方法来编写相同的代码？

Joined to this message is a photo of how the function evaluates the neighbors.加入此消息的是 function 如何评估邻居的照片。 The number of each point represents its rank (5 is the 5th neighbor evaluated).每个点的数量代表它的等级（5 是评估的第 5 个邻居）。

I tried something like this:我试过这样的事情：

Let's say that I have a datacube of 10x10x100 (100 is the timeseries):假设我有一个 10x10x100 的数据立方体（100 是时间序列）：

import math
import numpy as np

Vel = np.random.rand(10,10,100)
Vel[4:7,4:7,0:10] = np.nan

x = 5
y = 5

Vpoint = Vel[5,5,:]


for i in range(0,len(Vpoint)):
        
        xx = x
        yy = y
        
        
        if math.isnan(Vel[xx,yy,i]) == True:
        
            for n in range(0,50):
                
                n = n + 1
            
                if n > 10:
                    raise Exception("The interpolation is way too far") 
            
                xx = x + n
                yy = y
            
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x
                yy = y + n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
                
                yy = y-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                for p in range(1,n):
                    
                    xx = x+p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    xx = x-p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    
                    
                    for p in range(1,n):
                        
                        
                        yy = y+n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        yy = y-n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        
                        xx = x+n
                        yy = y+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        xx = x-n
                        yy = y-p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break

        print(n,xx,yy)

Ps: in reality my timeseries is close to 330x300x38000, and the closest non-nan neighbor should change every time. Ps：实际上我的时间序列接近330x300x38000，并且最近的非nan邻居每次都应该改变。

Answer 1

Here is what I came up with:这是我想出的：

import numpy as np
from scipy import interpolate

Vel = np.random.rand(10,10,100) 
Vel[4:7,4:7,0:10] = np.nan
Vel[4:7,4:7,20:30] = np.nan

def gap_filling(vect, interpolation):

    time = np.arange(0, np.shape(vect)[0])
    mask = np.isfinite(vect)
    f = interpolate.interp1d(time[mask], vect[mask], 
                             kind=interpolation, bounds_error=False)

    vect_filled = np.copy(vect)
    vect_filled[np.isnan(vect)] = f(time[np.isnan(vect)])

    return vect_filled

Vel_filled_nn = np.apply_along_axis(gap_filling, -1, Vel, 'nearest')

Vel_filled_li = np.apply_along_axis(gap_filling, -1, Vel, 'linear')

I create an interpolation function based on the available data through time, then map it onto the missing values and that for each time series of the data set.我根据可用数据通过时间创建插值 function，然后将 map 插入缺失值和数据集的每个时间序列的值。

But because I know for which application you are developing this code (data analysis in Earth sciences), I'd advise you to use an interpolation instead of a nearest neighbour (here Vel_filled_li ).但是因为我知道您正在为哪个应用程序开发此代码（地球科学中的数据分析），所以我建议您使用插值而不是最近邻（此处Vel_filled_li ）。 The results on one of the time series:时间序列之一的结果：

import matplotlib.pyplot as plt

plt.figure()
plt.plot(Vel_filled_nn[6, 6, :], 'o-', label='nearest neighbour')
plt.plot(Vel_filled_li[6, 6, :], 'o-', label='linear interpolation')
plt.plot(Vel[6, 6, :], 'o-', label='raw')
plt.legend(loc='upper right')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Variable', fontsize=15)

It is only a base and can/should be vectorised, using the axis parameter of interpolate.interp1d .它只是一个基础，可以/应该使用interpolate.interp1d的轴参数进行矢量化。

如何在数据立方体中搜索具有除 NaN 以外的值的最近邻点？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-04 21:31:47

如何在数据立方体中搜索具有除 NaN 以外的值的最近邻点？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-04 21:31:47

解决方案1
1 已采纳 2021-03-04 21:31:47