简体   繁体   English

在列表列表中的两个列表之间查找公共元素的最快方法

[英]Fastest way for finding a common element between two lists in a list of lists

I'm trying to identify bubble structures in a directed graph (orientation is left to right).我正在尝试识别有向图中的气泡结构(方向从左到右)。 I start walking the graph from a possible start node, for example the green node.我从一个可能的起始节点开始遍历图形,例如绿色节点。 Then I add one node to all paths, adding copies of paths when paths diverge again, like this:然后我向所有路径添加一个节点,当路径再次发散时添加路径副本,如下所示:

first iteration: [[3],[7]]第一次迭代:[[3],[7]]

Second iteration: [[3,4],[3,5],[7,8],[7,9]]第二次迭代:[[3,4],[3,5],[7,8],[7,9]]

After every iteration I want to check whether any paths intersect and save them as confirmed bubbles.每次迭代后,我想检查是否有任何路径相交并将它们保存为确认的气泡。 Currently I'm using a nested for-loop to compare every path to one another but the amount of paths can get very large and thus the script can get very slow.目前我正在使用嵌套的 for 循环来比较每条路径,但是路径的数量会变得非常大,因此脚本会变得非常慢。 The order of the path matters.路径的顺序很重要。

示例气泡

Any suggestions on how to improve the speed of comparing a path to another path in a list of paths?关于如何提高路径列表中一条路径与另一条路径的比较速度的任何建议?

def ExtendPaths(paths, outEdges):
newPaths = []
for path in paths:                                              
    nextNodes = GetNextSegment(path[len(path) - 1], outEdges)   
    if len(nextNodes) == 0:
        j=5
    else:
        for node in nextNodes:                                      
            newPath = list(path)                                    
            newPath.append(node)                                    
            newPaths.append(newPath)                    
return newPaths

def ComparePaths(paths, putativeBubbleStartSegment):
length = len(paths)
for path1 in paths:
    for path2 in paths:
        if path2 != path1:
            if len(set(path1).intersection(path2)) > 0:
                #Bubble confirmed

Where GetNextSegment simply returns a list of nodes connected to the node given to the function (in this case, the last node of the path).其中 GetNextSegment 仅返回连接到给定函数的节点的节点列表(在本例中,为路径的最后一个节点)。 outEdges is a dictionary with: node:[out,going,edges]. outEdges 是一个字典:node:[out,going,edges]。 In ComparePaths() a bubble is confirmed when the length of the .intersection() of 2 paths is greater than 0.在 ComparePaths() 中,当 2 条路径的 .intersection() 的长度大于 0 时,确认气泡。

A bubble is a graph structure where 2 paths diverge (from the green node, for example) and finally come together again.气泡是一种图结构,其中 2 条路径发散(例如,从绿色节点)并最终再次汇合。 In this example, the bubble would go from 2 to 11 with all nodes between them.在这个例子中,气泡将从 2 到 11,它们之间的所有节点。

I'm not asking for a complete bubble-finding algorithm, just for ideas on how to compare all paths to all other paths quickly.我不是要一个完整的气泡查找算法,只是想了解如何快速将所有路径与所有其他路径进行比较。

Instead of using a list of lists, consider using a set of tuples (if order matters) or a set of frozensets (if order does not matter).考虑使用一组元组(如果顺序很重要)或一组frozensets(如果顺序无关紧要),而不是使用列表列表。 Initialize newPaths with newPaths = set() , then add each path as a tuple or frozenset (which are hashable) rather than a list:使用newPaths = set()初始化newPaths ,然后将每个路径添加为元组或frozenset(可散列)而不是列表:

for node in nextNodes:                                      
    newPath = tuple(path) + (node,)
    # or: newPath = frozenset(path).union({node})
    newPaths.add(newPath)

This should make it a bit faster to check membership and intersections.这应该可以更快地检查成员资格和交叉点。

Also it looks like you are checking the same paths multiple times by looping through paths twice.此外,它看起来像你通过循环多次检查相同的路径paths两次。 For example, if you have path1 equal to (3, 4) and path2 equal to (3, 5) , you don't need to check (3, 4) versus (3, 5) and also (3, 5) versus (3, 4) , since your check appears symmetrical.例如,如果path1等于(3, 4)path2等于(3, 5) ,则不需要检查(3, 4)(3, 5)以及(3, 5)(3, 4) ,因为您的支票看起来是对称的。 You could simplify ComparePaths by using an itertools helper:您可以使用itertools帮助程序简化ComparePaths

from itertools import combinations

def ComparePaths(paths, putativeBubbleStartSegment):
    # This gets all combinations of paths without repeating any pairing.
    for path1, path2 in combinations(paths, 2)
        # Don't need to check the length of the intersection because an
        # empty set is always equivalent to "False" in an if statement.
        if set(path1).intersection(path2):
            # Bubble confirmed

It seems like your sample code is leaving out some details (since there are unused function arguments and variables), but what I see here doesn't seem like it should work for what you're trying to do.似乎您的示例代码遗漏了一些细节(因为有未使用的函数参数和变量),但我在这里看到的似乎不适用于您要执行的操作。 As a result, it's hard to suggest any other speed-ups, even though there might be other ways to improve your algorithm.因此,即使可能有其他方法来改进您的算法,也很难提出任何其他加速方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM