Numpy / Python中的快速性能数组处理

Question

I am trying to find out the optimal way (fastest performance) to process coordinate and measurement data stored in several numpy arrays. 我试图找出处理存储在几个numpy数组中的坐标和测量数据的最佳方式（最快的性能）。

I need to calculate the distance from each grid point (lot, lon, alt value in green in the attached image) to each measurement location (lat, lon, alt, range from target in gray in the attached image). 我需要计算从每个网格点（附图中的绿色，lon，alt值）到每个测量位置（lat，lon，alt，附加图像中灰色的目标范围）的距离。 Seeing as there are hundreds of grid points, and thousands of measurement ranges to calculate for each grid point, I would like to iterate through the arrays in the most efficient way possible 看到有数百个网格点，以及为每个网格点计算的数千个测量范围，我想以最有效的方式迭代数组

在此输入图像描述

I am trying to decide between how to store the LLA measurements for the grid and measurements, and then what the ideal way is to calculate the Mean Squared Error for each point on the grid based on the delta between the measured range value and the actual range. 我试图决定如何存储网格和测量的LLA测量值，然后根据测量范围值和实际范围之间的差值计算网格上每个点的平均平方误差的理想方法。。

Any ideas on how to best store these values, and then iterate across the grid to determine the range from each measurement would be very much appreciated. 关于如何最好地存储这些值，然后在网格中迭代以确定每次测量的范围的任何想法都将非常受欢迎。 Thanks!!! 谢谢！！！

Currently, I am using a 2D meshgrid to store the LLA values for the grid 目前，我正在使用2D网格网格来存储网格的LLA值

# Create a 2D Grid that will be used to store the MSE estimations
# First, create two 1-D arrays representing the X and Y coordinates of our grid
x_delta = abs(xmax-xmin)/gridsize_x
y_delta = abs(ymax-ymin)/gridsize_y
X = np.arange(xmin,xmax+x_delta,x_delta)
Y = np.arange(ymin,ymax+y_delta,y_delta)

# Next, pass arrays to meshgrid to return 2-D coordinate matrices from the 1-D coordinate arrays
grid_lon, grid_lat = np.meshgrid(X, Y)

I have the LLA points and range values from the measurements stored in a measurement class 我有测量类中存储的测量值的LLA点和范围值

measurement_lon = [measurement.gps.getlon() for measurement in target_measurements]
measurement_lat = [measurement.gps.getlat() for measurement in target_measurements]
measurement_range = [measurement.getrange() for measurement in target_measurements]

Measurement class 测量类

class RangeMeasurement:

def __init__(self, lat, lon, alt, range):
  self.gps = GpsLocation(lat,lon,alt)
  self.range = range

Really bad pseudocode for range calculation (iterative and very slow) 用于范围计算的非常糟糕的伪代码（迭代且非常慢）

for i in len(grid_lon):
  for j in len(measurement_lat):
    range_error += distance(grid_lon[i],grid_lat[i],measurement_lon[j],measurement_lat[j])-measurement_range[j]

Answer 1

I think the scipy.spatial.distance module will help you out with this problem: http://docs.scipy.org/doc/scipy/reference/spatial.distance.html 我认为scipy.spatial.distance模块将帮助您解决这个问题： http ：//docs.scipy.org/doc/scipy/reference/spatial.distance.html

You should store your points as 2-d numpy arrays with 2 columns and N rows, where N is the number of points in the array. 您应该将点存储为具有2列和N行的2-d numpy数组，其中N是数组中的点数。 To convert your grid_lon and grid_lat to this format, use 要将grid_lon和grid_lat转换为此格式，请使用

N1 = grid_lon.size
grid_point_array = np.hstack([grid_lon.reshape((N1,1)), grid_lat.reshape((N1,1))])

This takes all of the values in grid_lon, which are arranged in a rectangular array that is the same shape as the grid, and puts them in an array with one column and N rows. 这将获取grid_lon中的所有值，这些值排列在与网格形状相同的矩形阵列中，并将它们放在具有一列和N行的数组中。 It does the same for grid_lat. 它对grid_lat也是一样的。 The two one-column wide arrays are then combined to create a two column array. 然后组合两个单列宽阵列以创建两列阵列。

A similar method can be used to convert your measurement data: 可以使用类似的方法转换您的测量数据：

N2 = len(measurement_lon)
measurment_data_array = np.hstack([np.array(measurement_lon).reshape((N2,1)),
    np.array(measurement_lat).reshape((N2,1))])

Once your data is in this format, you can easily find the distances between each pair of points with scipy.spatial.distance: 一旦您的数据采用这种格式，您就可以使用scipy.spatial.distance轻松找到每对点之间的距离：

d = scipy.spatial.distance.cdist(grid_point_array, measurement_data_array, 'euclidean')

d will be an array with N1 rows and N2 columns, and d[i,j] will be the distance between grid point i and measurement point j. d将是具有N1行和N2列的阵列，并且d [i，j]将是网格点i和测量点j之间的距离。

EDIT Thanks for clarifying range error. 编辑感谢澄清范围错误。 Sounds like an interesting project. 听起来像一个有趣的项目。 This should give you the grid point with the smallest accumulated squared error: 这应该为您提供具有最小累积平方误差的网格点：

measurement_range_array = np.array(measurement_range)
flat_grid_idx = pow(measurement_range_array-d,2).sum(1).argmin()

This takes advantage of broadcasting to get the difference between a point's measured range and its distance from every grid point. 这利用广播来获得点的测量范围与其与每个网格点的距离之间的差异。 All of the errors for a given grid point are then summed, and the resulting 1-D array should be the accumulated error you're looking for. 然后对给定网格点的所有误差求和，得到的1-D数组应该是您正在寻找的累积误差。 argmin() is called to find the position of the smallest value. 调用argmin（）来查找最小值的位置。 To get the x and y grid coordinates from the flattened index, use 要从展平的索引中获取x和y网格坐标，请使用

grid_x = flat_grid_idx % gridsize_x
grid_y = flat_grid_idx // gridsize_x

(The // is integer division.) （//是整数除法。）

Numpy / Python中的快速性能数组处理

问题描述

1 个解决方案

解决方案1
3 已采纳 2011-12-06 22:50:53

Numpy / Python中的快速性能数组处理

问题描述

1 个解决方案

解决方案1 3 已采纳 2011-12-06 22:50:53

解决方案1
3 已采纳 2011-12-06 22:50:53