简体   繁体   中英

Filter rows in numpy array based on second array

I have 2 2d numpy arrays A and BI want to remove all the rows in A which appear in B.

I tried something like this:

A[~np.isin(A, B)]

but isin keeps the dimensions of A, I need one boolean value per row to filter it.

EDIT: something like this

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

.....

A = np.array([[3, 0, 4],
              [0, 5, 9]])

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, eg with ascontiguousarray :

Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()

Now you can apply np.isin directly:

>>> np.isin(Av, Bv)
array([False,  True, False])

According to the docs, invert=True is faster than negating the output of isin , so you can do

A[np.isin(Av, Bv, invert=True)]

Try the following - it uses matrix multiplication for dimensionality reduction:

import numpy as np

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])

Output:

[[3 0 4]
 [0 5 9]]

This is certainly not the most performant solution but it is relatively easy to read:

A = np.array([row for row in A if row not in B])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM