在分散的数据中查找最近的点

Question

I am struggling with improving the speed of interpolation of a large dataset which I am interpolating using gridfit.我正在努力提高我使用 gridfit 插值的大型数据集的插值速度。 I have already posted a question on stackoverflow but havent got a response我已经在stackoverflow上发布了一个问题，但没有得到回复

So, I am thinking of trying something alternate.所以，我正在考虑尝试一些替代的东西。 My idea is that if I have a huge dataset, as shown by the Python code snippet below我的想法是，如果我有一个庞大的数据集，如下面的 Python 代码片段所示

arr_len = 932826
xi = np.random.uniform(low=0, high=4496, size=arr_len)
yi = np.random.uniform(low=-74, high=492, size=arr_len)
zi = np.random.uniform(low=-30, high=97, size=arr_len)

I have to interpolate and get the values at defined points say (x, y).我必须插值并获取定义点的值，例如（x，y）。 What could be the quickest way to find the 4 neighbouring points from the scattered data xi, yi and zi so that a bilinear interpolation could be performed, using interp2d (see image below).从分散数据 xi、yi 和 zi 中找到 4 个相邻点的最快方法是什么，以便可以使用 interp2d 执行双线性插值（见下图）。 I dont know if this would give me faster results than using gridata, but would be nice to try it out我不知道这是否会比使用 gridata 给我更快的结果，但我会很高兴尝试一下

Answer 1

I think what you have in mind is essentially nearest neighbors regression .我认为您所想到的本质上是最近邻回归。 Here's how you could do this with scikit-learn.这是使用 scikit-learn 执行此操作的方法。 Note that the number 4 of neighbors considered is an arbitrary choice, so you could also try other values.请注意，考虑的邻居数 4 是任意选择，因此您也可以尝试其他值。

import numpy as np
from sklearn.neighbors import KNeighborsRegressor

arr_len = 932826
np.random.seed(42)
xi = np.random.uniform(low=0, high=4496, size=arr_len)
yi = np.random.uniform(low=-74, high=492, size=arr_len)
zi = np.random.uniform(low=-30, high=97, size=arr_len)

# points to get z-values for (e.g.):
x_new = [100, 500, 2000]
y_new = [400, 300, 100]

# in machine learning notation:
X_train = np.vstack([xi, yi]).T
y_train = zi
X_predict = np.vstack([x_new, y_new]).T

# fit 4-nearest neighbors regressor to the training data
neigh = KNeighborsRegressor(n_neighbors=4)
neigh.fit(X_train, y_train)

# get "interpolated" z-values
print(neigh.predict(X_predict))

[39.37712018  4.36600728 47.00192216]

在分散的数据中查找最近的点

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-25 12:59:55

在分散的数据中查找最近的点

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-25 12:59:55

解决方案1
1 已采纳 2021-05-25 12:59:55