I have a large field of 2D-position data, given as two arrays x
and y
, where len(x) == len(y)
. I would like to return the array of indices idx_masked
at which (x[idx_masked], y[idx_masked])
is masked by an N x N int
array called mask
. That is, mask[x[idx_masked], y[idx_masked]] == 1
. The mask
array consists of 0
s and 1
s only.
I have come up with the following solution, but it (specifically, the last line below) is very slow, given that I have N x N = 5000 x 5000, repeated 1000s of times:
import numpy as np
import matplotlib.pyplot as plt
# example mask of one corner of a square
N = 100
mask = np.zeros((N, N))
mask[0:10, 0:10] = 1
# example x and y position arrays in arbitrary units
x = np.random.uniform(0, 1, 1000)
y = np.random.uniform(0, 1, 1000)
x_bins = np.linspace(np.min(x), np.max(x), N)
y_bins = np.linspace(np.min(y), np.max(y), N)
x_bin_idx = np.digitize(x, x_bins)
y_bin_idx = np.digitize(y, y_bins)
idx_masked = np.ravel(np.where(mask[y_bin_idx - 1, x_bin_idx - 1] == 1))
plt.imshow(mask[::-1, :])
plt.scatter(x, y, color='red')
plt.scatter(x[idx_masked], y[idx_masked], color='blue')
Is there a more efficient way of doing this?
Given that mask
overlays your field with identically-sized bins, you do not need to define the bins explicitly. *_bin_idx
can be determined at each location from a simple floor division, since you know that each bin is 1 / N
in size. I would recommend using 1 - 0
for the total width (what you passed into np.random.uniform
) instead of x.max() - x.min()
, if of course you know the expected size of the range.
x0 = 0 # or x.min()
x1 = 1 # or x.max()
x_bin = (x1 - x0) / N
x_bin_idx = ((x - x0) // x_bin).astype(int)
# ditto for y
This will be faster and simpler than digitizing, and avoids the extra bin at the beginning.
For most purposes, you do not need np.where
. 90% of the questions asking about it (including this one) should not be using where
. If you want a fast way to access the necessary elements of x
and y
, just use a boolean mask. The mask is simply
selction = mask[x_bin_idx, y_bin_idx].astype(bool)
If mask
is already a boolean (which it should be anyway), the expression mask[x_bin_idx, y_bin_idx]
is sufficient. It results in an array of the same size as x_bin_idx
and y_bin_idx
(which are the same size as x
and y
) containing the mask value for each of your points. You can use the mask as
x[selection] # Elements of x in mask
y[selection] # Elements of y in mask
If you absolutely need the integer indices, where
is sill not your best option.
indices = np.flatnonzero(selection)
OR
indices = selection.nonzero()[0]
If your goal is simply to extract values from x
and y
, I would recommend stacking them together into a single array:
coords = np.stack((x, y), axis=1)
This way, instead of having to apply indices twice, you can extract the values with just
coords[selection, :]
OR
coords[indices, :]
Depending on the relative densities of mask
and x
and y
, either the boolean masking or linear indexing may be faster. You will have to time some relevant cases to get a better intuition.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.