简体   繁体   English

在不使用嵌套循环的情况下查找列表中所有对的有效方法

[英]Efficient way to find all the pairs in a list without using nested loop

Suppose I have a list that stores many 2D points.假设我有一个存储许多二维点的列表。 In this list, some positions are stored the same points, consider the index of positions that stored the same point as an index pair.在这个列表中,一些位置存储了相同的点,将存储相同点的位置的索引视为索引对。 I want to find all the pairs in the list and return all 2 by 2 index pairs.我想找到列表中的所有对并返回所有 2 x 2 索引对。 It is possible that the list has some points repeated more than two times, but only the first match needs to be treated as a pair.列表中可能有一些点重复了两次以上,但只需要将第一个匹配项视为一对。

For example, in the below list, I have 9 points in total and there are 5 positions containing repeated points.例如,在下面的列表中,我总共有 9 个点,并且有 5 个位置包含重复点。 The indices 0, 3, and 7 store the same point ( [1, 1] ), and the indicies 1 and 6 store the same point ( [2, 3] ).索引 0、3 和 7 存储相同的点 ( [1, 1] ),索引 1 和 6 存储相同的点 ( [2, 3] )。

[[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]

So, for this list, I want to return the index pair as (index 0, index 3) and (index 1, index 6).因此,对于此列表,我想将索引对返回为(索引 0,索引 3)和(索引 1,索引 6)。 The only solution I can come up with is doing this is through nested loops, which I code up as following我能想出的唯一解决方案是通过嵌套循环来实现,我将其编码如下

A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)

# I don't want to modified the original list, looping through a index list insted.
Index = np.arange(0, A.shape[0], 1, dtype=int) 
Pair = [] # for store the index pair
while Index.size != 0:

    current_index = Index[0]
    pi = A[current_index]
    Index = np.delete(Index, 0, 0)

    for j in range(Index.shape[0]):
        pj = A[Index[j]]
        distance = linalg.norm(pi - pj, ord=2, keepdims=True)
        
        if distance == 0:
            Pair.append([current_index, Index[j]])
            Index = np.delete(Index, j, 0)
            break

While this code works for me but the time complexity is O(n^2) , where n == len(A) , I'm wondering if is there any more efficient way to do this job with a lower time complexity.虽然这段代码对我有用,但时间复杂度是O(n^2) ,其中n == len(A) ,我想知道是否有更有效的方法来以较低的时间复杂度完成这项工作。 Thanks for any ideas and help.感谢您的任何想法和帮助。

You can use a dictionary to keep track of the indices for each point.您可以使用字典来跟踪每个点的索引。

Then, you can iterate over the items in the dictionary, printing out the indices corresponding to points that appear more than once.然后,您可以遍历字典中的项目,打印出与多次出现的点对应的索引。 The runtime of this procedure is linear, rather than quadratic, in the number of points in A :此过程的运行时间与A中的点数成线性关系,而不是二次关系:

points = {}

for index, point in enumerate(A):
    point_tuple = tuple(point)
    if point_tuple not in points:
        points[point_tuple] = []
    points[point_tuple].append(index)

for point, indices in points.items():
    if len(indices) > 1:
        print(indices)

This prints out:这打印出来:

[0, 3, 7]
[1, 6]

If you only want the first two indices where a point appears, you can use print(indices[:2]) rather than print(indices) .如果您只想要出现点的前两个索引,则可以使用print(indices[:2])而不是print(indices)

This is similar to the other answer, but since you only want the first two in the event of multiple pairs you can do it in a single iteration.这类似于另一个答案,但由于在多对的情况下您只需要前两个,因此您可以在一次迭代中完成。 Add the indices under the appropriate key in a dict and yield the indices if (and only if) there are two points:在 dict 中的适当键下添加索引,并在(且仅当)有两点时生成索引:

from collections import defaultdict

l = [[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]

def get_pairs(l):
    ind = defaultdict(list)

    for i, pair in enumerate(l):
        t = tuple(pair)
        ind[t].append(i)
        if len(ind[t]) == 2:
            yield list(ind[t])

list(get_pairs(l))
# [[0, 3], [1, 6]]

One pure-Numpy solution without loops ( the only one so far ) is to use np.unique twice with a trick that consists in removing the first items found between the two searches.一个没有循环的纯 Numpy 解决方案迄今为止唯一的一个)是使用np.unique两次,其中一个技巧是删除在两次搜索之间找到的第一个项目。 This solution assume a sentinel can be set (eg. -1, the minimum value of an integer, NaN) which is generally not a problem (you can use bigger types if needed).此解决方案假设可以设置标记(例如 -1,integer 的最小值,NaN),这通常不是问题(如果需要,您可以使用更大的类型)。

A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)

# Copy the array not to mutate it
tmp = A.copy()

# Find the location of unique values
pair1, index1 = np.unique(tmp, return_index=True, axis=0)

# Discard the element found assuming -1 is never stored in A
INT_MIN = np.iinfo(A.dtype).min
tmp[index1] = INT_MIN

# Find the location of duplicated values
pair2, index2 = np.unique(tmp, return_index=True, axis=0)

# Extract the indices that share the same pair of values found
left = index1[np.isin(pair1, pair2).all(axis=1)]
right = index2[np.isin(pair2, pair1).all(axis=1)]

# Combine the each left index with each right index
result = np.hstack((left[:,None], right[:,None]))

# result = array([[0, 3],
#                 [1, 6]])

This solution should run in O(n log n) time as np.unique uses a basic sort internally (more specifically quick-sort).该解决方案应在O(n log n)时间内运行,因为np.unique在内部使用基本排序(更具体地说是快速排序)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python 中最有效的计算方式在对列表中查找对 - Most efficient computational way in python find pairs in a list of pairs 查找列表中所有连接的整数对之和的高效算法 - Efficient algorithm to find the sum of all concatenated pairs of integers in a list 查找所有word2vec编码对的余弦距离,而不使用嵌套循环 - Find cosine distance for all pairs of word2vec encodings without using nested loops 递归 function 在嵌套列表中找到最小值,而不使用 for 循环 - Recursive function to find the minimum value in a nested list, without using a for loop 如何在不使用for循环的情况下查找列表中字符的所有实例? - How to find all instances of a character in a list without using a for loop? 在给定条件下查找列表中所有对的最佳方法是什么? - What is the best way to find all pairs in a list with a given condition? 求和所有可能对的有效方法 - Efficient way to sum all possible pairs 大多数pythonic(和有效)的方式将列表成对嵌套 - Most pythonic (and efficient) way of nesting a list in pairs 有没有一种绕过嵌套for循环的有效方法? - Is there a efficient way to bypass a nested for loop? 查找列表中每个单词的所有字谜的最有效方法 - most efficient way to find all the anagrams of each word in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM