简体   繁体   English

选择numpy数组中的行

[英]Selecting rows in numpy array

I have a numpy array ( mat ) of shape (n,4) . 我有一个形状(n,4)的numpy数组( mat (n,4) The array has four columns and large number ( n ) of rows. 该数组有四列和大量( n )行。 The first three columns represent x , y , z columns in my calculation. 前三列代表我计算中的xyz列。 I wish to select those rows of the numpy array where the x column has values below a given number ( min_x ) or values above a given number ( max_x ), and where the y column has values below a given number ( min_y ) or values above a given number ( max_y ) and where the z column has values below a given number ( min_z ) or values above a given number ( max_z ). 我希望选择那些numpy数组的行,其中x列的值低于给定的数字( min_x )或高于给定数字的值( max_x ),并且y列的值低于给定的数字( min_y )或上面的值给定数字( max_y )和z列的值低于给定数字( min_z )或高于给定数字( max_z )的值。

This is how I am trying to implement this desired functionality presently: 这就是我目前试图实现这个所需功能的方式:

import numpy as np

mark = np.where( ( (mat[:,0]<=min_x) | \
            (mat[:,0]>max_x) ) & \
                 ( (mat[:,1]<=min_y) | \
            (mat[:,1]>max_y) ) & \
                 ( (mat[:,2]<=min_z) | \
            (mat[:,2]>max_z) ) )

mat_new = mat[:,mark[0]]

Is the technique that I am using correct, and the best way to achieve the desired functionality? 我正在使用的技术是否正确,以及实现所需功能的最佳方法? I will greatly appreciate any help. 我将非常感谢任何帮助。 Thanks. 谢谢。

Looks good to me. 对我来说看上去很好。 You can make it a bit more compact by comparing the columns to the midrange values: 您可以通过将列与中间值进行比较来使其更紧凑:

mark = (np.abs(mat[:,0] - (max_x + min_x) / 2) > (max_x - min_x) / 2) &
       (np.abs(mat[:,1] - (max_y + min_y) / 2) > (max_y - min_y) / 2) &
       (np.abs(mat[:,2] - (max_z + min_z) / 2) > (max_z - min_z) / 2)

Unfortunately, you cannot control the precise boundary conditions ( < vs <= ) anymore. 不幸的是,您无法再控制精确的边界条件( < vs <= )。 Also, this is probably the slowest solution, even slower than the original one. 此外,这可能是最慢的解决方案,甚至比原始解决方案更慢。

What you have now looks fine. 你现在拥有什么看起来很好。 But since you are asking about other ways to achieve the desired functionality: you can create a 1-dimensional boolean mask that is either True or False for each row index. 但是,由于您正在询问实现所需功能的其他方法:您可以为每个行索引创建一个1维布尔掩码,该掩码为TrueFalse Here is an example. 这是一个例子。

>>> import numpy as np
>>> np.random.seed(444)

>>> shape = 15, 4
>>> mat = np.random.randint(low=0, high=10, size=shape)
>>> mat
array([[3, 0, 7, 8],
       [3, 4, 7, 6],
       [8, 9, 2, 2],
       [2, 0, 3, 8],
       [0, 6, 6, 0],
       [3, 0, 6, 7],
       [9, 3, 8, 7],
       [3, 2, 6, 9],
       [2, 9, 8, 9],
       [3, 2, 2, 8],
       [1, 5, 6, 7],
       [6, 0, 0, 0],
       [0, 4, 8, 1],
       [9, 8, 5, 8],
       [9, 4, 6, 6]])

# The thresholds for x, y, z, respectively
>>> lower = np.array([5, 5, 4])
>>> upper = np.array([6, 6, 7])
>>> idx = len(lower)
# Parentheses are required here.  NumPy boolean ops use | and &
# which have different operator precedence than `or` and `and`
>>> mask = np.all((mat[:, :idx] < lower) | (mat[:, :idx] > upper), axis=1)

>>> mask
array([False, False,  True,  True, False, False,  True, False,  True,
        True, False, False,  True, False, False])

Now indexing mat by mask will constrain it to row indices where mask is True : 现在通过mask索引mat会将其约束为maskTrue行索引:

>>> mat[mask]
array([[8, 9, 2, 2],
       [2, 0, 3, 8],
       [9, 3, 8, 7],
       [2, 9, 8, 9],
       [3, 2, 2, 8],
       [0, 4, 8, 1]])

What is a bit different about this approach is that it is scalable: instead of specifying each coordinate condition individually, you can specify them in two arrays, one for the upper threshold and one for the lower, and then take advantage of NumPy's vectorization & broadcasting to build the mask. 这种方法有点不同的是它是可扩展的:你可以在两个数组中指定它们,一个用于上限,一个用于下限,然后利用NumPy的矢量化和广播,而不是单独指定每个坐标条件。建立面具。

np.all() says, test that all values are True , row-wise. np.all()说, 测试所有值都是True ,行方式。 It captures the "and" conditions from your question, while the | 它从你的问题中捕获“和”条件,而| operator captures the "or". 操作员捕获“或”。

I'd just drop the np.where and use the boolean mask instead 我只是删除np.where并使用布尔掩码代替

x,y,z,_ = mat.T
mask = ( ( (x <= min_x) | (x > max_x) ) &
         ( (y <= min_y) | (y > max_y) ) &
         ( (z <= min_z) | (z > max_z) ) ) 
mat_new = mat[mask]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM