简体   繁体   English

从坐标创建2D Numpy数组

[英]Make 2D Numpy array from coordinates

I have data points that represent a coordinates for a 2D array (matrix). 我有数据点代表2D数组(矩阵)的坐标。 The points are regularly gridded, except that data points are missing from some grid positions. 除了某些网格位置缺少数据点之外,这些点经常被网格化。

For example, consider some XYZ data that fits on a regular 0.1 grid with shape (3, 4). 例如,考虑一些XYZ数据,它们适合具有形状(3,4)的常规0.1网格。 There are gaps and missing points, so there are 5 points, and not 12: 有缺口和缺失点,所以有5分,而不是12分:

import numpy as np
X = np.array([0.4, 0.5, 0.4, 0.4, 0.7])
Y = np.array([1.0, 1.0, 1.1, 1.2, 1.2])
Z = np.array([3.3, 2.5, 3.6, 3.8, 1.8])
# Evaluate the regular grid dimension values
Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1)
Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1)
print('Xr={0}; Yr={1}'.format(Xr, Yr))
# Xr=[ 0.4  0.5  0.6  0.7]; Yr=[ 1.   1.1  1.2]

What I would like to see is shown in this image (backgrounds: black=base-0 index; grey=coordinate value; colour=matrix value; white=missing). 我希望看到的是这张图片(背景:黑色=基础0指数;灰色=坐标值;颜色=矩阵值;白色=缺失)。

矩阵

Here's what I have, which is intuitive with a for loop: 这就是我所拥有的,这对于for循环是直观的:

ar = np.ma.array(np.zeros((len(Yr), len(Xr)), dtype=Z.dtype), mask=True)
for x, y, z in zip(X, Y, Z):
    j = (np.abs(Xr -  x)).argmin()
    i = (np.abs(Yr -  y)).argmin()
    ar[i, j] = z
print(ar)
# [[3.3 2.5 -- --]
#  [3.6 -- -- --]
#  [3.8 -- -- 1.8]]    

Is there a more NumPythonic way of vectorising the approach to return a 2D array ar ? 有没有向量化返回一个二维数组的方法更NumPythonic方式ar Or is the for loop necessary? 或者for循环是否必要?

You can do it on one line with np.histogram2d 你可以用np.histogram2d在一行上np.histogram2d

data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
print(data[0])
[[ 3.3  2.5  0.   0. ]
 [ 3.6  0.   0.   0. ]
 [ 3.8  0.   0.   1.8]]

You can use X and Y to create the XY coordinates on a 0.1 spaced grid extending from the min to max of X and min to max of Y and then inserting Z's into those specific positions. 您可以使用XY0.1间距网格上创建XY坐标,该网格从min to max of Xmin to max of Y延伸,然后将Z's插入到这些特定位置。 This would avoid using linspace to get Xr and Yr and as such must be quite efficient. 这样可以避免使用linspace来获得XrYr ,因此必须非常高效。 Here's the implementation - 这是实施 -

def indexing_based(X,Y,Z):
    # Convert X's and Y's to indices on a 0.1 spaced grid
    X_int = np.round((X*10)).astype(int)
    Y_int = np.round((Y*10)).astype(int)
    X_idx = X_int - X_int.min()
    Y_idx = Y_int - Y_int.min()

    # Setup output array and index it with X_idx & Y_idx to set those as Z
    out = np.zeros((Y_idx.max()+1,X_idx.max()+1))
    out[Y_idx,X_idx] = Z

    return out

Runtime tests - 运行时测试 -

This section compare the indexing-based approach against the other np.histogram2d based solution for performance - 本节将indexing-based方法与其他基于np.histogram2d性能解决方案进行比较 -

In [132]: # Create unique couples X-Y (as needed to work with histogram2d)
     ...: data = np.random.randint(0,1000,(5000,2))
     ...: data1 = data[np.lexsort(data.T),:]
     ...: mask = ~np.all(np.diff(data1,axis=0)==0,axis=1)
     ...: data2 = data1[np.append([True],mask)]
     ...: 
     ...: X = (data2[:,0]).astype(float)/10
     ...: Y = (data2[:,1]).astype(float)/10
     ...: Z = np.random.randint(0,1000,(X.size))
     ...: 

In [133]: def histogram_based(X,Y,Z): # From other np.histogram2d based solution
     ...:   Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1)
     ...:   Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1)
     ...:   data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
     ...:   return data[0]
     ...: 

In [134]: %timeit histogram_based(X,Y,Z)
10 loops, best of 3: 22.8 ms per loop

In [135]: %timeit indexing_based(X,Y,Z)
100 loops, best of 3: 2.11 ms per loop

You could use a scipy coo_matrix. 你可以使用scipy coo_matrix。 It allows you to construct a sparse matrix from coordinates and data. 它允许您根据坐标和数据构造稀疏矩阵。 See examples on the attached link. 请参阅所附链接上的示例。

http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html

Hope that helps. 希望有所帮助。

The sparse matrix is the first solution that came to mind, but since X and Y are floats, it's a little messy: sparse矩阵是我想到的第一个解决方案,但由于XY是浮点数,所以它有点乱:

In [624]: I=((X-.4)*10).round().astype(int)
In [625]: J=((Y-1)*10).round().astype(int)
In [626]: I,J
Out[626]: (array([0, 1, 0, 0, 3]), array([0, 0, 1, 2, 2]))

In [627]: sparse.coo_matrix((Z,(J,I))).A
Out[627]: 
array([[ 3.3,  2.5,  0. ,  0. ],
       [ 3.6,  0. ,  0. ,  0. ],
       [ 3.8,  0. ,  0. ,  1.8]])

It still needs, in one way or other, to match those coordinates with [0,1,2...] indexes. 它仍然需要以某种方式将这些坐标与[0,1,2 ...]索引相匹配。 My quick cheat was to just scale the values linearly. 我的快速欺骗就是线性地扩展数值。 Even so I had to take care when converting floats to ints. 即便如此,在将浮子转换为整数时我也要小心。

sparse.coo_matrix works because a natural way of defining a sparse matrix is with (i, j, data) tuples, which of course can be translated to I , J , Data lists or arrays. sparse.coo_matrix工作原理是因为定义稀疏矩阵的一种自然方式是使用(i, j, data)元组,这当然可以转换为IJData列表或数组。

I rather like the historgram solution, even though I haven't had occasion to use it. 我更喜欢历史解决方案,即使我没有机会使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM