简体   繁体   English

Select Numpy 数组中的所有行,其中每列满足某些条件

[英]Select all rows from Numpy array where each column satisfies some condition

I have an array x of the form,我有一个数组x的形式,

x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]

I also have an array of non-allowed numbers for each column.我还为每列提供了一组不允许的数字。 I want to select all of the rows which only have allowed characters in each column.我想 select 所有在每列中只允许字符的行。 For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column;例如,我可能只想要第一列中没有[1,2,3]的行; I can do this by,我可以做到这一点,

x[~np.in1d(x[:,0], [1,2,3])]

And for any single column, I can do this.对于任何单列,我都可以做到这一点。 But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column.但我希望基本上一次对所有列执行此操作,只选择每个元素都是其列的允许编号的行。 I can't seem to get x.any or x.all to do this well - how should I go about this?我似乎无法让x.anyx.all做好这件事 - 我应该如何处理这个问题?

EDIT : To clarify, the non-allowed numbers are different for each column.编辑:为了澄清,每列不允许的数字是不同的。 In actuality, I will have some array y,实际上,我会有一些数组 y,

y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]

Where I want rows from x for which column 1 cannot be in [1,2,3] , column 2 cannot be in [4,5,6] , and so on.我想要x中的行,其中第 1 列不能在[1,2,3]中,第 2 列不能在[4,5,6]中,依此类推。

You can broadcast the comparison, then all to check:您可以广播比较,然后all检查:

x[(x != y[:,None,:]).all(axis=(0,-1))]

Break down:分解:

# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])

# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)

# slice
x[mask]

For example, consider:例如,考虑:

x = np. array([[1, 2],
       [9, 8],
       [5, 6],
       [7, 8]])

y = np.array([[1, 4],
       [2, 5],
       [3, 7]])

Then mask = (x:= y[,,None:.]),all(axis=(0,1)) gives然后mask = (x:= y[,,None:.]),all(axis=(0,1))给出

array([False,  True,  True,  True])

It's recommended to use np.isin rather than np.in1d these days.这些天建议使用np.isin而不是np.in1d This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.这使您可以 (a) 一次比较整个数组,并且 (b) 更有效地反转掩码。

x[np.isin(x, [1, 2, 3], invert=True).all(1)]

np.isin preserves the shape of x , so you can then use .all across the columns. np.isin保留x的形状,因此您可以在列中使用.all It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]) , but more efficiently.它还有一个invert参数,可以让您执行相当于~isin(x, [1, 2, 3])的操作,但效率更高。

This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.该解决方案将类似的计算向量化为另一个更有效的建议(尽管它仍然是线性搜索),并且还避免创建临时 arrays。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM