简体   繁体   English

如何在数据立方体中搜索具有除 NaN 以外的值的最近邻点?

[英]How to search for a nearest-neighbor point with a value else than NaN in a datacube?

I am working with a datacube such as data[x,y,z].我正在使用数据立方体,例如 data[x,y,z]。 Each point is a velocity through time, and the [x,y] grid corresponds to coordinates.每个点都是时间的速度,[x,y] 网格对应坐标。 If I pick a point of coordinates x and y, it is likely that the timeseries is incomplete (with some NaNs).如果我选择坐标 x 和 y 的点,则时间序列很可能不完整(带有一些 NaN)。 I created a function which searches for the closest neighbor with a value, and replaces the NaN of my xy point with it.我创建了一个 function,它用一个值搜索最近的邻居,并用它替换我的 xy 点的 NaN。 However I want to know if there is a more efficient way to code something which does the same?但是我想知道是否有更有效的方法来编写相同的代码?

Joined to this message is a photo of how the function evaluates the neighbors.加入此消息的是 function 如何评估邻居的照片。 The number of each point represents its rank (5 is the 5th neighbor evaluated).每个点的数量代表它的等级(5 是评估的第 5 个邻居)。 算法如何评估最近邻

I tried something like this:我试过这样的事情:

Let's say that I have a datacube of 10x10x100 (100 is the timeseries):假设我有一个 10x10x100 的数据立方体(100 是时间序列):

import math
import numpy as np

Vel = np.random.rand(10,10,100)
Vel[4:7,4:7,0:10] = np.nan

x = 5
y = 5

Vpoint = Vel[5,5,:]


for i in range(0,len(Vpoint)):
        
        xx = x
        yy = y
        
        
        if math.isnan(Vel[xx,yy,i]) == True:
        
            for n in range(0,50):
                
                n = n + 1
            
                if n > 10:
                    raise Exception("The interpolation is way too far") 
            
                xx = x + n
                yy = y
            
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x
                yy = y + n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
                
                yy = y-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                for p in range(1,n):
                    
                    xx = x+p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    xx = x-p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    
                    
                    for p in range(1,n):
                        
                        
                        yy = y+n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        yy = y-n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        
                        xx = x+n
                        yy = y+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        xx = x-n
                        yy = y-p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break

        print(n,xx,yy)

Ps: in reality my timeseries is close to 330x300x38000, and the closest non-nan neighbor should change every time. Ps:实际上我的时间序列接近330x300x38000,并且最近的非nan邻居每次都应该改变。

Here is what I came up with:这是我想出的:

import numpy as np
from scipy import interpolate

Vel = np.random.rand(10,10,100) 
Vel[4:7,4:7,0:10] = np.nan
Vel[4:7,4:7,20:30] = np.nan

def gap_filling(vect, interpolation):

    time = np.arange(0, np.shape(vect)[0])
    mask = np.isfinite(vect)
    f = interpolate.interp1d(time[mask], vect[mask], 
                             kind=interpolation, bounds_error=False)

    vect_filled = np.copy(vect)
    vect_filled[np.isnan(vect)] = f(time[np.isnan(vect)])

    return vect_filled

Vel_filled_nn = np.apply_along_axis(gap_filling, -1, Vel, 'nearest')

Vel_filled_li = np.apply_along_axis(gap_filling, -1, Vel, 'linear')

I create an interpolation function based on the available data through time, then map it onto the missing values and that for each time series of the data set.我根据可用数据通过时间创建插值 function,然后将 map 插入缺失值和数据集的每个时间序列的值。

But because I know for which application you are developing this code (data analysis in Earth sciences), I'd advise you to use an interpolation instead of a nearest neighbour (here Vel_filled_li ).但是因为我知道您正在为哪个应用程序开发此代码(地球科学中的数据分析),所以我建议您使用插值而不是最近邻(此处Vel_filled_li )。 The results on one of the time series:时间序列之一的结果:

import matplotlib.pyplot as plt

plt.figure()
plt.plot(Vel_filled_nn[6, 6, :], 'o-', label='nearest neighbour')
plt.plot(Vel_filled_li[6, 6, :], 'o-', label='linear interpolation')
plt.plot(Vel[6, 6, :], 'o-', label='raw')
plt.legend(loc='upper right')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Variable', fontsize=15)

在此处输入图像描述

It is only a base and can/should be vectorised, using the axis parameter of interpolate.interp1d .它只是一个基础,可以/应该使用interpolate.interp1d的轴参数进行矢量化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何仅使用numpy和without循环重写给定的最近邻居函数? - How to rewrite a given nearest-neighbor function with only using numpy and without loops? 如何使用Python使用最近邻算法对数据进行分类? - How can I classify data with the nearest-neighbor algorithm using Python? 在该应用中使用的最佳最近邻居算法是什么? - What would be the best nearest-neighbor algorithm to use for this application? 最近邻搜索:Python - Nearest Neighbor Search: Python 优化scipy最近邻搜索 - Optimize scipy nearest neighbor search 以最小最近邻距离在 3D 空间中生成随机点 - Generate random points in 3D space with minimum nearest-neighbor distance 如何使用 python 加速最近邻搜索? - How can I speed up nearest neighbor search with python? KD树最近邻搜索如何工作? - How does the KD-tree nearest neighbor search work? 在 3D 空间中以最小最近邻距离和最大密度随机采样给定点 - Sample given points stochastically in a 3D space with minimum nearest-neighbor distance and maximum density 如何使用Spatial.kdTree树获取具有point_id的对象点的最近邻居 - How to get nearest neighbor of object point that have point_id using tree of Spatial.kdTree
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM