简体   繁体   中英

Remove column from numpy array according to columns in another array?

I have a 2d numpy array, call it C:

A = np.array([1,10,2])
B = np.array([4,-2,5])
C = np.vstack([A,B])

and another 2d numpy array, call it G:

E = np.array([4,2,6])
F = np.array([0,5,30])
G = np.vstack([E,F])

I would like to return the 1d boolean that is true if a column in G matches a column in C, so in this case

output = [False,True,False]

The second element here is true because (2,5) is the second element in G and also matches the third element in C.

In reality, C and G are arrays with ~3million elements, but figuring this out should be good enough!

I believe this fits your needs for the given example. I'm not good enough with numpy to know if will scale well to millions of records though.

import numpy as np

A = np.array([1,10,2])
B = np.array([4,-2,5])
C = np.vstack([A,B]).T

E = np.array([4,2,6])
F = np.array([0,5,30])
G = np.vstack([E,F]).T

matches = [(C == g).any() for g in [g for g in G]]
print(matches)

You may define a contiguous view and use np.in1d

make_view = lambda a : np.ascontiguousarray(a.T).view([('', a.dtype)] * a.shape[0]).T.ravel()
Cv, Gv = make_view(C), make_view(G)

>>> np.in1d(Gv, Cv)
array([False,  True, False])

You didn't mention the number of columns you have so I assumed its small.

C_r = np.repeat(C[:,:,np.newaxis],C.shape[1],axis=2)
G_r = np.repeat(G[:,:,np.newaxis],G.shape[1],axis=2)
G_r = np.transpose(G_r,(0,2,1))

a = ~np.sum(G_r-C_r,axis=0).astype(bool)
np.any(a,axis=0)
Out[95]: array([False,  True, False])
>>> g=G.transpose()
>>> c=set(tuple(map(tuple, C.transpose())))
>>> np.array([tuple(item) in c for item in g])

    array([False,  True, False])

Just to throw my pandas idea in here, too:

import  pandas as pd

dfc = pd.DataFrame(C).apply(tuple)
dfg = pd.DataFrame(G).apply(tuple)

print(dfg.isin(dfc))

# 0    False
# 1     True                                                
# 2    False                                                  
# dtype: bool                                      

However, tupelizing millions of elements might be no fun though... :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM