搜索numpy数组（（x，y，z）...），z匹配最近的x，y

Question

I have a very large array similar to elevation data of the format: 我有一个非常大的数组类似于格式的高程数据：

triplets = ((x0, y0, z0), 
            (x1, y1, z1), 
            ... ,
            (xn, yn, zn))

where x, y, z are all floats in metres. 其中x，y，z都是以米为单位的浮点数。 You can create suitable test data matching this format with: 您可以创建与此格式匹配的合适测试数据：

x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))

I want to be able to efficiently find the corresponding z-value for a given (x,y) pair. 我希望能够有效地找到给定（x，y）对的相应z值。 My research so far leads to more questions. 到目前为止，我的研究引出了更多问题。 Here's what I've got: 这是我得到的：

Iterate through all of the triplets: 迭代所有三元组：
```
 query = (a, b) # where a, b are the x and y coordinates we're looking for for i in triplets: if i[0] == query[0] and i[1] == query[1]: result = i[2] 
```
Drawbacks: slow; 缺点：缓慢; a, b must exist, which is a problem with comparing floats. a, b必须存在，这是比较浮点数的问题。

Use scipy.spatial.cKDTree to find nearest: 使用scipy.spatial.cKDTree查找最近的：

 points = triplets[:,0:2] # drops the z column tree = cKDTree(points) idx = tree.query((a, b))[1] # this returns a tuple, we want the index query = tree.data[idx] result = triplets[idx, 2]

Drawbacks: returns nearest point rather than interpolating. 缺点：返回最近点而不是插值。

Using interp2d as per comment: 根据评论使用interp2d ：
```
 f = interp2d(x, y, z) result = f(a, b) 
```
Drawbacks: doesn't work on a large dataset. 缺点：不适用于大型数据集。 I get OverflowError: Too many data points to interpolate when run on real data. 我得到OverflowError: Too many data points to interpolate在实际数据上运行时OverflowError: Too many data points to interpolate 。 (My real data is some 11 million points.) （我的真实数据是大约1100万点。）

So the question is: is there any straightforward way of doing this that I'm overlooking? 所以问题是：是否有任何直截了当的做法，我忽视了？ Are there ways to reduce the drawbacks of the above? 有没有办法减少上述的弊端？

Answer 1

If you want to interpolate the result, rather than just find the z value for the nearest neighbour, I would consider doing something like the following: 如果你想插入结果，而不是只找到最近邻居的z值，我会考虑做如下的事情：

Use a kd tree to partition your data points according to their (x, y) coordinates 使用kd树根据（x，y）坐标对数据点进行分区
For a given (xi, yi) point to interpolate, find its k nearest neighbours 对于给定的（xi，yi）点插值，找到它的k个最近邻点
Take the average of their z values, weighted according to their distance from (xi, yi) 取其z值的平均值，根据它们与（xi，yi）的距离加权

The code might look something like this: 代码可能如下所示：

import numpy as np
from scipy.spatial import cKDTree

# some fake (x, y, z) data
XY = np.random.rand(10000, 2) - 0.5
Z = np.exp(-((XY ** 2).sum(1) / 0.1) ** 2)

# construct a k-d tree from the (x, y) coordinates
tree = cKDTree(XY)

# a random point to query
xy = np.random.rand(2) - 0.5

# find the k nearest neighbours (say, k=3)
distances, indices = tree.query(xy, k=3)

# the z-values for the k nearest neighbours of xy
z_vals = Z[indices]

# take the average of these z-values, weighted by 1 / distance from xy
dw_avg = np.average(z_vals, weights=(1. / distances))

It's worth playing around a bit with the value of k , the number of nearest neighbours to take the average of. 值得玩的是k的值，即取平均值的最近邻居的数量。 This is essentially a crude form of kernel density estimation , where the value of k controls the degree of 'smoothness' you're imposing on the underlying distribution of z-values. 这实质上是核密度估计的粗略形式，其中k的值控制着你对z值的基础分布施加的“平滑度”。 A larger k results in more smoothness. 较大的k导致更平滑。

Similarly, you might want to play around with how you weight the contributions of points according to their distance from (xi, yi) , depending on how you think similarity in z decreases with increasing x, y distance. 类似地，您可能想要根据它们与（xi，yi）的距离来加权点的贡献，这取决于您认为z的相似性随着x，y距离的增加而减小。 For example you might want to weight by (1 / distances ** 2) rather than (1 / distances) . 例如，您可能想要(1 / distances ** 2)而不是(1 / distances)加权。

In terms of performance, constructing and searching kd trees are both very efficient . 在性能方面，构建和搜索kd树都非常有效。 Bear in mind that you only need to construct the tree once for your dataset, and if necessary you can query multiple points at a time by passing (N, 2) arrays to tree.query() . 请记住，您只需要为数据集构建一次树，如果需要，您可以通过将（N，2）数组传递给tree.query()来一次查询多个点。

Tools for approximate nearest neighbour searches, such as FLANN , might potentially be quicker, but these are usually more helpful in situations when the dimensionality of your data is very high. 用于近似最近邻搜索的工具（如FLANN ）可能更快，但在数据的维度非常高的情况下，这些工具通常更有用。

Answer 2

I don't understand your cKDTree code, you got the idx , why do the for loop again? 我不明白你的cKDTree代码，你得到了idx ，为什么再次循环for？ You can get the result just by result = triplets[idx, 2] . 你可以通过result = triplets[idx, 2]得到结果。

from scipy.spatial import cKDTree

x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))

a = 30.1
b = 40.5

points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
result = triplets[idx, 2]

Answer 3

You can create a sparse matrix and use simple indexing. 您可以创建稀疏矩阵并使用简单的索引。

In [1]: import numpy as np
In [2]: x = np.arange(20, 40, dtype=np.float64)
In [3]: y = np.arange(30, 50, dtype=np.float64)
In [4]: z = np.random.random(20) * 25.0
In [9]: from scipy.sparse import coo_matrix
In [12]: m = coo_matrix((z, (x, y))).tolil()
In [17]: m[25,35]
Out[17]: 17.410532044604292

搜索numpy数组（（x，y，z）...），z匹配最近的x，y

问题描述

3 个解决方案

解决方案1
4 已采纳 2014-05-21 01:10:04

解决方案2
3 2014-05-21 00:48:08

解决方案3
0 2014-05-20 20:30:58

搜索numpy数组（（x，y，z）...），z匹配最近的x，y

问题描述

3 个解决方案

解决方案1 4 已采纳 2014-05-21 01:10:04

解决方案2 3 2014-05-21 00:48:08

解决方案3 0 2014-05-20 20:30:58

解决方案1
4 已采纳 2014-05-21 01:10:04

解决方案2
3 2014-05-21 00:48:08

解决方案3
0 2014-05-20 20:30:58