[英]Assign numpy array of points to a 2D square grid
由于速度问题,我超越了我之前的问题。 我有一个Lat / Lon坐标点数组,我想将它们分配给从相同大小的单元格的2D方格网格派生的索引代码。 这是一个如何的例子。 让我们叫points
(称他们为[XY]对)的六分我含坐标第一个数组:
points = [[ 1.5 1.5]
[ 1.1 1.1]
[ 2.2 2.2]
[ 1.3 1.3]
[ 3.4 1.4]
[ 2. 1.5]]
然后我有另一个数组,包含[minx,miny,maxx,maxy]形式的两个单元格的顶点坐标; 让我们称之为bounds
:
bounds = [[ 0. 0. 2. 2.]
[ 2. 2. 3. 3.]]
我想找到哪个点位于哪个边界,然后分配从bounds
数组索引派生的代码(在这种情况下,第一个单元格有代码0,第二个单元格等等......)。 由于单元格是正方形,因此计算每个单元格中每个单元格的最简单方法是评估:
x > minx & x < maxx & y > miny & y < maxy
这样生成的数组将显示为:
results = [0 0 1 0 NaN NaN]
其中NaN表示该点在细胞外。 在我的实际情况中,元素的数量是在10 ^ 4个单元格中找到10 ^ 6个点的顺序。 有没有办法使用numpy数组快速完成这种事情?
编辑:澄清一下, results
数组预期意味着第一个点在第一个单元格内( bounds
数组的0个索引),所以第二个,第一个点在bounds
数组的第二个单元格bounds
,依此类推......
这是针对您的问题的矢量化方法。 它应该显着加快速度。
import numpy as np
def findCells(points, bounds):
# make sure points is n by 2 (pool.map might send us 1D arrays)
points = points.reshape((-1,2))
# check for each point if all coordinates are in bounds
# dimension 0 is bound
# dimension 1 is is point
allInBounds = (points[:,0] > bounds[:,None,0])
allInBounds &= (points[:,1] > bounds[:,None,1])
allInBounds &= (points[:,0] < bounds[:,None,2])
allInBounds &= (points[:,1] < bounds[:,None,3])
# now find out the positions of all nonzero (i.e. true) values
# nz[0] contains the indices along dim 0 (bound)
# nz[1] contains the indices along dim 1 (point)
nz = np.nonzero(allInBounds)
# initialize the result with all nan
r = np.full(points.shape[0], np.nan)
# now use nz[1] to index point position and nz[0] to tell which cell the
# point belongs to
r[nz[1]] = nz[0]
return r
def findCellsParallel(points, bounds, chunksize=100):
import multiprocessing as mp
from functools import partial
func = partial(findCells, bounds=bounds)
# using python3 you could also do 'with mp.Pool() as p:'
p = mp.Pool()
try:
return np.hstack(p.map(func, points, chunksize))
finally:
p.close()
def main():
nPoints = 1e6
nBounds = 1e4
# points = np.array([[ 1.5, 1.5],
# [ 1.1, 1.1],
# [ 2.2, 2.2],
# [ 1.3, 1.3],
# [ 3.4, 1.4],
# [ 2. , 1.5]])
points = np.random.random([nPoints, 2])
# bounds = np.array([[0,0,2,2],
# [2,2,3,3]])
# bounds = np.array([[0,0,1.4,1.4],
# [1.4,1.4,2,2],
# [2,2,3,3]])
bounds = np.sort(np.random.random([nBounds, 2, 2]), 1).reshape(nBounds, 4)
r = findCellsParallel(points, bounds)
print(points[:10])
for bIdx in np.unique(r[:10]):
if np.isnan(bIdx):
continue
print("{}: {}".format(bIdx, bounds[bIdx]))
print(r[:10])
if __name__ == "__main__":
main()
编辑:
尝试使用大量数据会给我一个MemoryError
。 如果您使用multiprocessing.Pool
及其map
函数,您可以避免这种情况甚至加快速度,请参阅更新的代码。
结果:
>time python test.py
[[ 0.69083585 0.19840985]
[ 0.31732711 0.80462512]
[ 0.30542996 0.08569184]
[ 0.72582609 0.46687164]
[ 0.50534322 0.35530554]
[ 0.93581095 0.36375539]
[ 0.66226118 0.62573407]
[ 0.08941219 0.05944215]
[ 0.43015872 0.95306899]
[ 0.43171644 0.74393729]]
9935.0: [ 0.31584562 0.18404152 0.98215445 0.83625487]
9963.0: [ 0.00526106 0.017255 0.33177741 0.9894455 ]
9989.0: [ 0.17328876 0.08181912 0.33170444 0.23493507]
9992.0: [ 0.34548987 0.15906761 0.92277442 0.9972481 ]
9993.0: [ 0.12448765 0.5404578 0.33981119 0.906822 ]
9996.0: [ 0.41198261 0.50958195 0.62843379 0.82677092]
9999.0: [ 0.437169 0.17833114 0.91096133 0.70713434]
[ 9999. 9993. 9989. 9999. 9999. 9935. 9999. 9963. 9992. 9996.]
real 0m 24.352s
user 3m 4.919s
sys 0m 1.464s
您可以使用嵌套循环来检查条件并将结果作为生成器生成:
points = [[ 1.5 1.5]
[ 1.1 1.1]
[ 2.2 2.2]
[ 1.3 1.3]
[ 3.4 1.4]
[ 2. 1.5]]
bounds = [[ 0. ,0. , 2., 2.],
[ 2. ,2. ,3., 3.]]
import numpy as np
def pos(p,b):
for x,y in p:
flag=False
for index,dis in enumerate(b):
minx,miny,maxx,maxy=dis
if x > minx and x < maxx and y > miny and y < maxy :
flag=True
yield index
if not flag:
yield 'NaN'
print list(pos(points,bounds))
结果:
[0, 0, 1, 0, 'NaN', 'NaN']
我会这样做:
import numpy as np
points = np.random.rand(10,2)
xmin = [0.25,0.5]
ymin = [0.25,0.5]
results = np.zeros(len(points))
for i in range(len(xmin)):
bool_index_array = np.greater(points, [xmin[i],ymin[i]])
print "boolean index of (x,y) greater (xmin, ymin): ", bool_index_array
indicies_of_true_true = np.where(bool_index_array[:,0]*bool_index_array[:,1]==1)[0]
print "indices of [True,True]: ", indicies_of_true_true
results[indicies_of_true_true] += 1
print "results: ", results
[out]: [ 1. 1. 1. 2. 0. 0. 1. 1. 1. 1.]
这使用较低的边界将您的点分类为组:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.