I have a 2d numpy array, call it C:
A = np.array([1,10,2])
B = np.array([4,-2,5])
C = np.vstack([A,B])
and another 2d numpy array, call it G:
E = np.array([4,2,6])
F = np.array([0,5,30])
G = np.vstack([E,F])
I would like to return the 1d boolean that is true if a column in G matches a column in C, so in this case
output = [False,True,False]
The second element here is true because (2,5) is the second element in G and also matches the third element in C.
In reality, C and G are arrays with ~3million elements, but figuring this out should be good enough!
I believe this fits your needs for the given example. I'm not good enough with numpy to know if will scale well to millions of records though.
import numpy as np
A = np.array([1,10,2])
B = np.array([4,-2,5])
C = np.vstack([A,B]).T
E = np.array([4,2,6])
F = np.array([0,5,30])
G = np.vstack([E,F]).T
matches = [(C == g).any() for g in [g for g in G]]
print(matches)
You may define a contiguous view and use np.in1d
make_view = lambda a : np.ascontiguousarray(a.T).view([('', a.dtype)] * a.shape[0]).T.ravel()
Cv, Gv = make_view(C), make_view(G)
>>> np.in1d(Gv, Cv)
array([False, True, False])
You didn't mention the number of columns you have so I assumed its small.
C_r = np.repeat(C[:,:,np.newaxis],C.shape[1],axis=2)
G_r = np.repeat(G[:,:,np.newaxis],G.shape[1],axis=2)
G_r = np.transpose(G_r,(0,2,1))
a = ~np.sum(G_r-C_r,axis=0).astype(bool)
np.any(a,axis=0)
Out[95]: array([False, True, False])
>>> g=G.transpose()
>>> c=set(tuple(map(tuple, C.transpose())))
>>> np.array([tuple(item) in c for item in g])
array([False, True, False])
Just to throw my pandas idea in here, too:
import pandas as pd
dfc = pd.DataFrame(C).apply(tuple)
dfg = pd.DataFrame(G).apply(tuple)
print(dfg.isin(dfc))
# 0 False
# 1 True
# 2 False
# dtype: bool
However, tupelizing millions of elements might be no fun though... :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.