简体   繁体   English

Python Numpy:基于坐标创建二维值数组

[英]Python numpy: create 2d array of values based on coordinates

I have a file containing 3 columns, where the first two are coordinates (x,y) and the third is a value (z) corresponding to that position. 我有一个包含3列的文件,其中前两列是坐标(x,y),第三列是对应于该位置的值(z)。 Here's a short example: 这是一个简短的示例:

x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13

I want to create a 2D array of values from the third row based on their x,y coordinates in the file. 我想根据文件中的x,y坐标从第三行创建2D值数组。 I read in each column as an individual array, and I created grids of x values and y values using numpy.meshgrid, like this: 我将每一列读为一个单独的数组,并使用numpy.meshgrid创建了x值和y值的网格,如下所示:

x = [[0 1 2]    and   y = [[0 0 0]
     [0 1 2]               [1 1 1]
     [0 1 2]]              [2 2 2]]

but I'm new to Python and don't know how to produce a third grid of z values that looks like this: 但是我是Python的新手,不知道如何生成第三个z值网格,如下所示:

z = [[Nan 15 Nan]
     [14  16  18]
     [17  Nan 13]]

Replacing Nan with 0 would be fine, too; 0替换Nan也可以; my main problem is creating the 2D array in the first place. 我的主要问题是首先创建2D数组。 Thanks in advance for your help! 在此先感谢您的帮助!

Assuming the x and y values in your file directly correspond to indices (as they do in your example), you can do something similar to this: 假设文件中的xy值直接对应于索引(如示例中所示),则可以执行以下操作:

import numpy as np

x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]

z_array = np.nan * np.empty((3,3))
z_array[y, x] = z

print z_array

Which yields: 产生:

[[ nan  15.  nan]
 [ 14.  16.  18.]
 [ 17.  nan  13.]]

For large arrays, this will be much faster than the explicit loop over the coordinates. 对于大型数组,这将比坐标上的显式循环快得多。


Dealing with non-uniform x & y input 处理x和y输入不一致

If you have regularly sampled x & y points, then you can convert them to grid indices by subtracting the "corner" of your grid (ie x0 and y0 ), dividing by the cell spacing, and casting as ints. 如果您定期采样x和y点,则可以通过减去网格的“角”(即x0y0 ),除以像元间距并将其转换为整数来将它们转换为网格索引。 You can then use the method above or in any of the other answers. 然后,您可以使用上面的方法,也可以在其他任何答案中使用。

As a general example: 作为一般示例:

i = ((y - y0) / dy).astype(int)
j = ((x - x0) / dx).astype(int)

grid[i,j] = z

However, there are a couple of tricks you can use if your data is not regularly spaced. 但是,如果您的数据没有规则排列,则可以使用一些技巧。

Let's say that we have the following data: 假设我们有以下数据:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 10))

fig, ax = plt.subplots()
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

在此处输入图片说明

That we want to put into a regular 10x10 grid: 我们要放入常规的10x10网格中:

在此处输入图片说明

We can actually use/abuse np.histogram2d for this. 我们实际上可以为此使用/滥用np.histogram2d Instead of counts, we'll have it add the value of each point that falls into a cell. 代替计数,我们将其添加到单元格中的每个点的值。 It's easiest to do this through specifying weights=z, normed=False . 通过指定weights=z, normed=False最容易做到这一点。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 10))

# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
zi = np.ma.masked_equal(zi, 0)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

在此处输入图片说明

However, if we have a large number of points, some bins will have more than one point. 但是,如果我们有大量的点,则某些箱将具有多个点。 The weights argument to np.histogram simply adds the values. np.histogramweights参数仅添加值。 That's probably not what you want in this case. 在这种情况下,这可能不是您想要的。 Nonetheless, we can get the mean of the points that fall in each cell by dividing by the counts. 尽管如此,我们可以通过除以计数来获得每个单元格中落点的平均值。

So, for example, let's say we have 50 points: 举例来说,假设我们有50分:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 50))

# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(10,10))

zi = zi / counts
zi = np.ma.masked_invalid(zi)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

在此处输入图片说明

With very large numbers of points, this exact method will become slow (and can be sped up easily), but it's sufficient for anything less than ~1e6 points. 点数非常多时,这种精确的方法会变慢(并且可以轻松加速),但对于少于1e6点的东西就足够了。

You could try something like: 您可以尝试类似:

import numpy as np

x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]

arr = np.zeros((3,3))
yx = zip(y,x)

for i, coord in enumerate(yx):
    arr[coord] = z[i]

print arr
>>> [[  0.  15.   0.]
     [ 14.  16.  18.]
     [ 17.   0.  13.]]

Kezzos beat me to it but I had a similar approach, Kezzos击败了我,但我采取了类似的方法,

x = np.array([0,0,1,1,2,2])
y = np.array([1,2,0,1,1,2])
z = np.array([14,17,15,16,18,13])
Z = np.zeros((3,3))
for i,j in enumerate(zip(x,y)): 
    Z[j] = z[i]

Z[np.where(Z==0)] = np.nan

If you have scipy installed, you could take advantage of its sparse matrix module. 如果您安装了scipy ,则可以利用其sparse矩阵模块。 Get the values from the text file with genfromtxt , and plug those 'columns' directly into a sparse matrix creator. 使用genfromtxt从文本文件中获取值,然后将这些“列”直接插入sparse矩阵创建器中。

In [545]: txt=b"""x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13
"""

In [546]: xyz=np.genfromtxt(txt.splitlines(),names=True,dtype=int)

In [547]: sparse.coo_matrix((xyz['z'],(xyz['y'],xyz['x']))).A     
Out[547]: 
array([[ 0, 15,  0],
       [14, 16, 18],
       [17,  0, 13]])

But Joe's z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] 但是乔的z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] is considerably faster. z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z']的速度要快得多。

Nice answers by others. 别人的好答案。 Thought this might be a useful snippet for someone else who might need this. 认为这对于可能需要此功能的其他人可能是有用的代码段。

def make_grid(x, y, z):
    '''
    Takes x, y, z values as lists and returns a 2D numpy array
    '''
    dx = abs(np.sort(list(set(x)))[1] - np.sort(list(set(x)))[0])
    dy = abs(np.sort(list(set(y)))[1] - np.sort(list(set(y)))[0])
    i = ((x - min(x)) / dx).astype(int) # Longitudes
    j = ((y - max(y)) / dy).astype(int) # Latitudes
    grid = np.nan * np.empty((len(set(j)),len(set(i))))
    grid[-j, i] = z # if using latitude and longitude (for WGS/West)
    return grid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM