[英]Python merge two arrays on specific columns
I have two sets/arrays/lists of the form 我有两套/阵列/表格形式
a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]
And I would like to find c which is a list of items from b found in a such that only the first two elements in the tuple need to match (ex 12 14). 我想求c是从A中的 B,使得仅在元组中的前两个元素需要匹配(例如12月14日)发现的项目清单。 So in this case the answer c would be
所以在这种情况下答案c是
c = [(12, 14, 44, 12), (19, 22, 96, 45)]
I used nested loops however it is way too slow. 我使用了嵌套循环,但是它太慢了。 Thanks
谢谢
You can do this with a list comprehension 您可以通过列表理解来做到这一点
>>> a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
>>> b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]
>>> [item for item in b for checker in a if item[:2] == checker[:2]]
[(12, 14, 44, 12), (19, 22, 96, 45)]
You can do this O(N)
time if you store all the unique two item tuples from a
in a set first: 你可以这样做
O(N)
如果你从存储的所有独特的双项元组的时间a
在一组第一:
>>> keys = {x[:2] for x in a}
>>> [x for x in b if x[:2] in keys]
[(12, 14, 44, 12), (19, 22, 96, 45)]
Note that if you're only trying to match items on the same index, then simply use zip
with a list comprehension: 请注意,如果您只想匹配同一索引上的项目,则只需将
zip
与列表zip
一起使用:
>>> [y for x, y in zip(a, b) if x[:2] == y[:2]]
[(12, 14, 44, 12), (19, 22, 96, 45)]
#Equivalent Numpy version:
>>> arr_a = np.array(a)
>>> arr_b = np.array(b)
>>> arr_b[(arr_b[:,:2] == arr_a[:,:2]).all(axis=1)]
array([[12, 14, 44, 12],
[19, 22, 96, 45]])
you can do with numpy
if you're using numpy
你可以做
numpy
,如果你正在使用numpy
In [49]: a = np.array([(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)])
In [50]: b = np.array([(12, 14, 44.0, 12.0), (5, 4, 66.0, 12.0), (19, 22, 96.0, 45.0)])
In [51]: print b[np.all(a[:,:2]==b[:,:2],1)]
[[ 12. 14. 44. 12.]
[ 19. 22. 96. 45.]]
How it works? 这个怎么运作?
In [52]: print a[:,:2]==b[:,:2]
[[ True True]
[False False]
[ True True]]
np.all
takes an array of booleans and reduces using a logical and along the axis specified by the optional second argument (or using all the elements) np.all
采用布尔数组,并使用逻辑和沿可选第二个参数指定的轴(或使用所有元素)减少
In [53]: print np.all(a[:,:2]==b[:,:2])
False
In [69]: print np.all(a[:,:2]==b[:,:2],1)
[ True False True]
In [70]: print np.all(a[:,:2]==b[:,:2],0)
[False False]
In [71]:
in our case, of course, the right axis to use is 1
. 当然,在我们的情况下,要使用的右轴是
1
。
(ps: I must confess a bit of sloppiness in treating the types of your array values) (ps:在处理数组值的类型时,我必须承认有些草率)
List Comprehension 清单理解
>>> a = [(12, 14, 0.3, 0.6, 0.8), (16, 18, 0.4, 0.5, 0.3), (19, 22, 0.4, 0.5, 0.3)]
>>> b = [(12, 14, 44, 12), (5, 4, 66, 12), (19, 22, 96, 45)]
>>> c=[j for i in a for j in b if i[:2]==j[:2]]
Output: 输出:
[(12, 14, 44, 12), (19, 22, 96, 45)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.