简体   繁体   English

Python在特定列上合并两个数组

[英]Python merge two arrays on specific columns

I have two sets/arrays/lists of the form 我有两套/阵列/表格形式

a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]

And I would like to find c which is a list of items from b found in a such that only the first two elements in the tuple need to match (ex 12 14). 我想求c是从A B,使得仅在元组中的前两个元素需要匹配(例如12月14日)发现的项目清单。 So in this case the answer c would be 所以在这种情况下答案c

c = [(12, 14, 44, 12), (19, 22, 96, 45)]

I used nested loops however it is way too slow. 我使用了嵌套循环,但是它太慢了。 Thanks 谢谢

You can do this with a list comprehension 您可以通过列表理解来做到这一点

>>> a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
>>> b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]
>>> [item for item in b for checker in a if item[:2] == checker[:2]]
[(12, 14, 44, 12), (19, 22, 96, 45)]

You can do this O(N) time if you store all the unique two item tuples from a in a set first: 你可以这样做O(N)如果你从存储的所有独特的双项元组的时间a在一组第一:

>>> keys = {x[:2] for x in a}
>>> [x for x in b if x[:2] in keys]
[(12, 14, 44, 12), (19, 22, 96, 45)]

Note that if you're only trying to match items on the same index, then simply use zip with a list comprehension: 请注意,如果您只想匹配同一索引上的项目,则只需将zip与列表zip一起使用:

>>> [y for x, y in zip(a, b) if x[:2] == y[:2]]
[(12, 14, 44, 12), (19, 22, 96, 45)]

#Equivalent Numpy version:
>>> arr_a = np.array(a)
>>> arr_b = np.array(b)
>>> arr_b[(arr_b[:,:2] == arr_a[:,:2]).all(axis=1)]
array([[12, 14, 44, 12],
       [19, 22, 96, 45]])

you can do with numpy if you're using numpy 你可以做numpy ,如果你正在使用numpy

In [49]: a = np.array([(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)])

In [50]: b = np.array([(12, 14, 44.0, 12.0), (5, 4, 66.0, 12.0), (19, 22, 96.0, 45.0)])

In [51]: print b[np.all(a[:,:2]==b[:,:2],1)]
[[ 12.  14.  44.  12.]
 [ 19.  22.  96.  45.]]

How it works? 这个怎么运作?

In [52]: print a[:,:2]==b[:,:2]
[[ True  True]
 [False False]
 [ True  True]]

np.all takes an array of booleans and reduces using a logical and along the axis specified by the optional second argument (or using all the elements) np.all采用布尔数组,并使用逻辑和沿可选第二个参数指定的轴(或使用所有元素)减少

In [53]: print np.all(a[:,:2]==b[:,:2])
False

In [69]: print np.all(a[:,:2]==b[:,:2],1)
[ True False  True]

In [70]: print np.all(a[:,:2]==b[:,:2],0)
[False False]

In [71]:

in our case, of course, the right axis to use is 1 . 当然,在我们的情况下,要使用的右轴是1

(ps: I must confess a bit of sloppiness in treating the types of your array values) (ps:在处理数组值的类型时,我必须承认有些草率)

List Comprehension 清单理解

      >>> a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
      >>> b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]
      >>> c=[j  for i in a for j in b  if i[:2]==j[:2]]

Output: 输出:

 [(12, 14, 44, 12), (19, 22, 96, 45)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM