简体   繁体   English

通过python在列表中找到不同的对

[英]Find different pair in a list by python

I have a list and I want to find different pair in list. 我有一个列表,我想在列表中找到不同的对。 I implement a function --> different() 我实现了一个函数 - > different()

import numpy as np


def different(array):
    res = []
    for (x1, y1), (x2, y2) in array:
        if (x1, y1) != (x2, y2):
            res.append([(x1, y1), (x2, y2)])
    return res


a = np.array([[[1, 2], [3, 4]],
              [[1, 2], [1, 2]],
              [[7, 9], [6, 3]],
              [[3, 3], [3, 3]]])

out = different(a)  # get [[(1, 2), (3, 4)],
                    #      [(7, 9), (6, 3)]]

Is there any other better way to do it? 还有其他更好的方法吗? I want to improve my function different . 我想提高我的功能有所不同 List size may be greater than 100,000. 列表大小可能大于100,000。

The numpy way to do it is 这种笨拙的方式是

import numpy as np

a = np.array([[[1, 2], [3, 4]],
              [[1, 2], [1, 2]],
              [[7, 9], [6, 3]],
              [[3, 3], [3, 3]]])

b = np.logical_or(a[:,0,0] != a[:,1,0],  a[:,0,1] != a[:,1,1])

print(a[b])

Vectorized Comparison 矢量化比较

a[~(a[:, 0] == a[:, 1]).all(1)]

array([[[1, 2],
        [3, 4]],

       [[7, 9],
        [6, 3]]])

This works by taking the first pair of each subarray and comparing each one with the second pair. 这是通过获取每个子阵列的第一对并将每个子阵列与第二对子阵列进行比较来实现的。 All subarrays for which entries which are not identical only are selected. 选择不相同的条目的所有子阵列。 Consider, 考虑,

a[:, 0] == a[:, 1]

array([[False, False],
       [ True,  True],
       [False, False],
       [ True,  True]])

From this, we want those rows which do not have True at each column. 从这里,我们希望每列中没有True的那些行。 So, on this result, use all and then negate the result. 因此,在此结果上,使用all然后否定结果。

~(a[:, 0] == a[:, 1]).all(1)
array([ True, False,  True, False])

This gives you a mask you can then use to select subarrays from a . 这为您提供了一个掩码,然后您可以使用它来从a选择子阵列。


np.logical_or.reduce

Similar to the first option above, but approaches this problem from the other end (see DeMorgan's Law). 与上面的第一个选项类似,但从另一端接近这个问题(参见DeMorgan定律)。

a[np.logical_or.reduce(a[:, 0] != a[:, 1], axis=1)]

Solutions time comparisons 解决方案时间比较

When there are so many different approaches to a problem, time comparisons can really help sort out the better answers. 当存在许多不同的问题方法时,时间比较可以真正帮助找出更好的答案。

Setup 设定

We use an array of size (200000, 2, 2) as OP Vincentlai pointed out that is in the range of the expected array size. 我们使用一个大小为(200000, 2, 2) 200000,2,2 (200000, 2, 2)的数组,因为OP Vincentlai指出它在预期的数组大小范围内。

a = np.array(np.random.randint(10, size=(200000, 2, 2)))


Using Joe answer: numpy.logical_and 使用Joe回答: numpy.logical_and

%timeit b = a[np.logical_and(a[:,0,0] != a[:,1,0],  a[:,0,1] != a[:,1,1])]
>>> 5.12 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using Coldspeed first answer: vectorised comparison 使用Coldspeed第一个答案:矢量化比较

%timeit b = a[~(a[:, 0] == a[:, 1]).all(1)]
>>> 13.7 ms ± 559 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using Coldspeed second answer: numpy.logical_or 使用Coldspeed第二个答案: numpy.logical_or

%timeit b = a[np.logical_or.reduce(a[:, 0] != a[:, 1], axis=1)]
>>> 13.2 ms ± 498 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using U9 Forward answer: filters 使用U9转发答案:过滤器

%timeit b = list(filter(lambda x: x[0]!=x[1],a.tolist()))
>>> 102 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using aydow answer: filters 使用aydow答案:过滤器

%timeit b = [[(x1, y1), (x2, y2)] for (x1, y1), (x2, y2) in a if (x1, y1) != (x2, y2)]
>>> 752 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Conclusions 结论

Joe's approach with numpy.logical_and is by far the faster one. Joe的numpy.logical_and方法是迄今为止更快的方法。 Predictably, every full python approach falls extremely short to anything numpy. 可以预见的是,每个完整的python方法都非常短暂。

Try using filter : 尝试使用filter

import numpy as np

def different(array):   
   return list(filter(lambda x: x[0]!=x[1],array.tolist()))

a = np.array([[[1, 2], [3, 4]],
              [[1, 2], [1, 2]],
              [[7, 9], [6, 3]],
              [[3, 3], [3, 3]]])

out = different(a)
print(out)

By using list comprehension in one line we can do like as below, 通过在一行中使用列表理解,我们可以像下面这样做,

items_list = [[[1, 2], [3, 4]],
              [[1, 2], [1, 2]],
              [[7, 9], [6, 3]],
              [[3, 3], [3, 3]]
             ]

# Output
[itm for itm in items_list if itm[0] != itm[1]]

Use a list comprehension 使用列表理解

def different(array):
    return [[(x1, y1), (x2, y2)] for (x1, y1), (x2, y2) in array if (x1, y1) != (x2, y2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python列表中找到最接近的值对 - find closest value pair in python list 遍历 python 中的字典列表以查找一对值的所有出现 - Iterating over a list of dictionaries in python to find all occurances of a pair of values 从python中的消息列表中找到最常用的词对 - Find the most frequent word pair from a list of messages in python 有条件地配对python中的列表项 - Conditionally pair list items in python Python - 使用'set'查找列表中的不同项 - Python - Use 'set' to find the different items in list 如何查找数据框的唯一对值(在不同的行和列上)的计数并在 Python 中进行可视化? - How to find count of unique pair values (on different rows and columns) of a dataframe and do its visualization in Python? 如果不将值对添加到 python 中的列表中,则值对已存在于列表中 - value pair is already existing in the list if not add the value pair to the list in python 使用字典在 python 中找到所有可能的组合对 - find all possible combo pair in the list whose sum=0 in python using a dict 在 Python 中找到元组列表的平均值的最快方法是什么,每个元组包含一对命名元组? - What is the fastest way to find the average for a list of tuples in Python, each tuple containing a pair of namedtuples? 在字典列表中找到匹配的值并配对字符串 - find matching values in list of dictionaries and pair the strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM