[英]How to search for a nearest-neighbor point with a value else than NaN in a datacube?
I am working with a datacube such as data[x,y,z].我正在使用数据立方体,例如 data[x,y,z]。 Each point is a velocity through time, and the [x,y] grid corresponds to coordinates.
每个点都是时间的速度,[x,y] 网格对应坐标。 If I pick a point of coordinates x and y, it is likely that the timeseries is incomplete (with some NaNs).
如果我选择坐标 x 和 y 的点,则时间序列很可能不完整(带有一些 NaN)。 I created a function which searches for the closest neighbor with a value, and replaces the NaN of my xy point with it.
我创建了一个 function,它用一个值搜索最近的邻居,并用它替换我的 xy 点的 NaN。 However I want to know if there is a more efficient way to code something which does the same?
但是我想知道是否有更有效的方法来编写相同的代码?
Joined to this message is a photo of how the function evaluates the neighbors.加入此消息的是 function 如何评估邻居的照片。 The number of each point represents its rank (5 is the 5th neighbor evaluated).
每个点的数量代表它的等级(5 是评估的第 5 个邻居)。
I tried something like this:我试过这样的事情:
Let's say that I have a datacube of 10x10x100 (100 is the timeseries):假设我有一个 10x10x100 的数据立方体(100 是时间序列):
import math
import numpy as np
Vel = np.random.rand(10,10,100)
Vel[4:7,4:7,0:10] = np.nan
x = 5
y = 5
Vpoint = Vel[5,5,:]
for i in range(0,len(Vpoint)):
xx = x
yy = y
if math.isnan(Vel[xx,yy,i]) == True:
for n in range(0,50):
n = n + 1
if n > 10:
raise Exception("The interpolation is way too far")
xx = x + n
yy = y
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x-n
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x
yy = y + n
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
yy = y-n
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
for p in range(1,n):
xx = x+p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
for p in range(1,n):
yy = y+n
xx = x+p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
yy = y-n
xx = x+p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x+n
yy = y+p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
yy = y-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
xx = x-n
yy = y-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
yy = y-p
if math.isnan(Vel[xx,yy,i]) == False:
Vpoint[i] = Vel[xx,yy,i]
break
print(n,xx,yy)
Ps: in reality my timeseries is close to 330x300x38000, and the closest non-nan neighbor should change every time. Ps:实际上我的时间序列接近330x300x38000,并且最近的非nan邻居每次都应该改变。
Here is what I came up with:这是我想出的:
import numpy as np
from scipy import interpolate
Vel = np.random.rand(10,10,100)
Vel[4:7,4:7,0:10] = np.nan
Vel[4:7,4:7,20:30] = np.nan
def gap_filling(vect, interpolation):
time = np.arange(0, np.shape(vect)[0])
mask = np.isfinite(vect)
f = interpolate.interp1d(time[mask], vect[mask],
kind=interpolation, bounds_error=False)
vect_filled = np.copy(vect)
vect_filled[np.isnan(vect)] = f(time[np.isnan(vect)])
return vect_filled
Vel_filled_nn = np.apply_along_axis(gap_filling, -1, Vel, 'nearest')
Vel_filled_li = np.apply_along_axis(gap_filling, -1, Vel, 'linear')
I create an interpolation function based on the available data through time, then map it onto the missing values and that for each time series of the data set.我根据可用数据通过时间创建插值 function,然后将 map 插入缺失值和数据集的每个时间序列的值。
But because I know for which application you are developing this code (data analysis in Earth sciences), I'd advise you to use an interpolation instead of a nearest neighbour (here Vel_filled_li
).但是因为我知道您正在为哪个应用程序开发此代码(地球科学中的数据分析),所以我建议您使用插值而不是最近邻(此处
Vel_filled_li
)。 The results on one of the time series:时间序列之一的结果:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(Vel_filled_nn[6, 6, :], 'o-', label='nearest neighbour')
plt.plot(Vel_filled_li[6, 6, :], 'o-', label='linear interpolation')
plt.plot(Vel[6, 6, :], 'o-', label='raw')
plt.legend(loc='upper right')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Variable', fontsize=15)
It is only a base and can/should be vectorised, using the axis parameter of interpolate.interp1d
.它只是一个基础,可以/应该使用
interpolate.interp1d
的轴参数进行矢量化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.