简体   繁体   中英

Numpy array: How to extract whole rows based on values in a column

I am looking for the equivalent of an SQL 'where' query over a table. I have done a lot of searching and I'm either using the wrong search terms or not understanding the answers. Probably both.

So a table is a 2 dimensional numpy array.

my_array = np.array([[32, 55,  2],
                     [15,  2, 60], 
                     [76, 90,  2], 
                     [ 6, 65,  2]])

I wish to 'end up' with a numpy array of the same shape where eg the second column values are >= 55 AND <= 65.

So my desired numpy array would be...

desired_array([[32, 55,  2],
               [ 6, 65,  2]])

Also, does 'desired_array' order match 'my_array' order?

Just make mask and use it.

mask = np.logical_and(my_array[:, 1] >= 55, my_array[:, 1] <= 65)
desired_array = my_array[mask]
desired_array

The general Numpy approach to filtering an array is to create a "mask" that matches the desired part of the array, and then use it to index in.

>>> my_array[((55 <= my_array) & (my_array <= 65))[:, 1]]
array([[32, 55,  2],
       [ 6, 65,  2]])

Breaking it down:

# Comparing an array to a scalar gives you an array of all the results of
# individual element comparisons (this is called "broadcasting").
# So we take two such boolean arrays, resulting from comparing values to the
# two thresholds, and combine them together.
mask = (55 <= my_array) & (my_array <= 65)

# We only want to care about the [1] element in the second array dimension,
# so we take a 1-dimensional slice of that mask.
desired_rows = mask[:, 1]

# Finally we use those values to select the desired rows.
desired_array = my_array[desired_rows]

(The first two operations could instead be swapped - that way I imagine is more efficient, but it wouldn't matter for something this small. This way is the way that occurred to me first.)

You dont mean the same shape. You probably meant the same column size. The shape of my_array is (4, 3) and the shape of your desired array is (2, 3). I would recommend masking, too.

You can use a filter statement with a lambda that checks each row for the desired condition to get the desired result:

my_array = np.array([[32, 55,  2],
                     [15,  2, 60], 
                     [76, 90,  2], 
                     [ 6, 65,  2]])

desired_array = np.array([l for l in filter(lambda x: x[1] >= 55 and x[1] <= 65, my_array)])

Upon running this, we get:

>>> desired_array
array([[32, 55,  2],
       [ 6, 65,  2]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM