在 Python 中使用 2D 掩码从 (x,y) 字段中有效地选择元素

Question

I have a large field of 2D-position data, given as two arrays x and y , where len(x) == len(y) .我有一个很大的二维位置数据字段，以两个数组x和y ，其中len(x) == len(y) 。 I would like to return the array of indices idx_masked at which (x[idx_masked], y[idx_masked]) is masked by an N x N int array called mask .我想返回索引idx_masked的数组，其中(x[idx_masked], y[idx_masked])被称为mask的 N x N int数组mask 。 That is, mask[x[idx_masked], y[idx_masked]] == 1 .也就是说， mask[x[idx_masked], y[idx_masked]] == 1 。 The mask array consists of 0 s and 1 s only. mask数组仅由0 s 和1 s 组成。

I have come up with the following solution, but it (specifically, the last line below) is very slow, given that I have N x N = 5000 x 5000, repeated 1000s of times:我想出了以下解决方案，但它（特别是下面的最后一行）非常慢，因为我有 N x N = 5000 x 5000，重复了 1000 次：

import numpy as np
import matplotlib.pyplot as plt

# example mask of one corner of a square
N = 100
mask = np.zeros((N, N))
mask[0:10, 0:10] = 1

# example x and y position arrays in arbitrary units
x = np.random.uniform(0, 1, 1000)
y = np.random.uniform(0, 1, 1000)

x_bins = np.linspace(np.min(x), np.max(x), N)
y_bins = np.linspace(np.min(y), np.max(y), N)

x_bin_idx = np.digitize(x, x_bins)
y_bin_idx = np.digitize(y, y_bins)

idx_masked = np.ravel(np.where(mask[y_bin_idx - 1, x_bin_idx - 1] == 1))

plt.imshow(mask[::-1, :])

plt.scatter(x, y, color='red')
plt.scatter(x[idx_masked], y[idx_masked], color='blue')

Is there a more efficient way of doing this?有没有更有效的方法来做到这一点？

Answer 1

Given that mask overlays your field with identically-sized bins, you do not need to define the bins explicitly.鉴于该mask用相同大小的 bin 覆盖了您的字段，您无需明确定义 bin。 *_bin_idx can be determined at each location from a simple floor division, since you know that each bin is 1 / N in size. *_bin_idx可以通过简单的楼层划分在每个位置确定，因为您知道每个 bin 的大小为1 / N I would recommend using 1 - 0 for the total width (what you passed into np.random.uniform ) instead of x.max() - x.min() , if of course you know the expected size of the range.我建议使用1 - 0作为总宽度（你传递给np.random.uniform ）而不是x.max() - x.min() ，当然如果你知道范围的预期大小。

x0 = 0   # or x.min()
x1 = 1   # or x.max()
x_bin = (x1 - x0) / N
x_bin_idx = ((x - x0) // x_bin).astype(int)

# ditto for y

This will be faster and simpler than digitizing, and avoids the extra bin at the beginning.这将比数字化更快、更简单，并避免在开始时出现额外的 bin。

For most purposes, you do not need np.where .大多数情况下，您不需要np.where 。 90% of the questions asking about it (including this one) should not be using where . 90% 的问题（包括这个）不应该使用where 。 If you want a fast way to access the necessary elements of x and y , just use a boolean mask.如果您想快速访问x和y的必要元素，只需使用布尔掩码。 The mask is simply面膜简直了

selction = mask[x_bin_idx, y_bin_idx].astype(bool)

If mask is already a boolean (which it should be anyway), the expression mask[x_bin_idx, y_bin_idx] is sufficient.如果mask已经是一个布尔值（无论如何它应该是），表达式mask[x_bin_idx, y_bin_idx]就足够了。 It results in an array of the same size as x_bin_idx and y_bin_idx (which are the same size as x and y ) containing the mask value for each of your points.它会生成一个与x_bin_idx和y_bin_idx大小相同的数组（与x和y大小相同），其中包含每个点的掩码值。 You can use the mask as您可以将面膜用作

x[selection]   # Elements of x in mask
y[selection]   # Elements of y in mask

If you absolutely need the integer indices, where is sill not your best option.如果您绝对需要整数索引，那么where不是您的最佳选择。

indices = np.flatnonzero(selection)

OR或者

indices = selection.nonzero()[0]

If your goal is simply to extract values from x and y , I would recommend stacking them together into a single array:如果您的目标只是从x和y提取值，我建议将它们堆叠到一个数组中：

coords = np.stack((x, y), axis=1)

This way, instead of having to apply indices twice, you can extract the values with just这样，您不必两次应用索引，只需使用以下命令即可提取值

coords[selection, :]

OR或者

coords[indices, :]

Depending on the relative densities of mask and x and y , either the boolean masking or linear indexing may be faster.根据mask和x和y的相对密度，布尔掩码或线性索引可能更快。 You will have to time some relevant cases to get a better intuition.您将不得不计时一些相关案例以获得更好的直觉。

在 Python 中使用 2D 掩码从 (x,y) 字段中有效地选择元素

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-30 14:58:29

在 Python 中使用 2D 掩码从 (x,y) 字段中有效地选择元素

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-30 14:58:29

解决方案1
2 已采纳 2020-03-30 14:58:29