简体   繁体   中英

Select all rows from Numpy array where each column satisfies some condition

I have an array x of the form,

x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]

I also have an array of non-allowed numbers for each column. I want to select all of the rows which only have allowed characters in each column. For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column; I can do this by,

x[~np.in1d(x[:,0], [1,2,3])]

And for any single column, I can do this. But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column. I can't seem to get x.any or x.all to do this well - how should I go about this?

EDIT : To clarify, the non-allowed numbers are different for each column. In actuality, I will have some array y,

y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]

Where I want rows from x for which column 1 cannot be in [1,2,3] , column 2 cannot be in [4,5,6] , and so on.

You can broadcast the comparison, then all to check:

x[(x != y[:,None,:]).all(axis=(0,-1))]

Break down:

# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])

# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)

# slice
x[mask]

For example, consider:

x = np. array([[1, 2],
       [9, 8],
       [5, 6],
       [7, 8]])

y = np.array([[1, 4],
       [2, 5],
       [3, 7]])

Then mask = (x:= y[,,None:.]),all(axis=(0,1)) gives

array([False,  True,  True,  True])

It's recommended to use np.isin rather than np.in1d these days. This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.

x[np.isin(x, [1, 2, 3], invert=True).all(1)]

np.isin preserves the shape of x , so you can then use .all across the columns. It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]) , but more efficiently.

This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM