简体   繁体   中英

Fastest way for finding a common element between two lists in a list of lists

I'm trying to identify bubble structures in a directed graph (orientation is left to right). I start walking the graph from a possible start node, for example the green node. Then I add one node to all paths, adding copies of paths when paths diverge again, like this:

first iteration: [[3],[7]]

Second iteration: [[3,4],[3,5],[7,8],[7,9]]

After every iteration I want to check whether any paths intersect and save them as confirmed bubbles. Currently I'm using a nested for-loop to compare every path to one another but the amount of paths can get very large and thus the script can get very slow. The order of the path matters.

示例气泡

Any suggestions on how to improve the speed of comparing a path to another path in a list of paths?

def ExtendPaths(paths, outEdges):
newPaths = []
for path in paths:                                              
    nextNodes = GetNextSegment(path[len(path) - 1], outEdges)   
    if len(nextNodes) == 0:
        j=5
    else:
        for node in nextNodes:                                      
            newPath = list(path)                                    
            newPath.append(node)                                    
            newPaths.append(newPath)                    
return newPaths

def ComparePaths(paths, putativeBubbleStartSegment):
length = len(paths)
for path1 in paths:
    for path2 in paths:
        if path2 != path1:
            if len(set(path1).intersection(path2)) > 0:
                #Bubble confirmed

Where GetNextSegment simply returns a list of nodes connected to the node given to the function (in this case, the last node of the path). outEdges is a dictionary with: node:[out,going,edges]. In ComparePaths() a bubble is confirmed when the length of the .intersection() of 2 paths is greater than 0.

A bubble is a graph structure where 2 paths diverge (from the green node, for example) and finally come together again. In this example, the bubble would go from 2 to 11 with all nodes between them.

I'm not asking for a complete bubble-finding algorithm, just for ideas on how to compare all paths to all other paths quickly.

Instead of using a list of lists, consider using a set of tuples (if order matters) or a set of frozensets (if order does not matter). Initialize newPaths with newPaths = set() , then add each path as a tuple or frozenset (which are hashable) rather than a list:

for node in nextNodes:                                      
    newPath = tuple(path) + (node,)
    # or: newPath = frozenset(path).union({node})
    newPaths.add(newPath)

This should make it a bit faster to check membership and intersections.

Also it looks like you are checking the same paths multiple times by looping through paths twice. For example, if you have path1 equal to (3, 4) and path2 equal to (3, 5) , you don't need to check (3, 4) versus (3, 5) and also (3, 5) versus (3, 4) , since your check appears symmetrical. You could simplify ComparePaths by using an itertools helper:

from itertools import combinations

def ComparePaths(paths, putativeBubbleStartSegment):
    # This gets all combinations of paths without repeating any pairing.
    for path1, path2 in combinations(paths, 2)
        # Don't need to check the length of the intersection because an
        # empty set is always equivalent to "False" in an if statement.
        if set(path1).intersection(path2):
            # Bubble confirmed

It seems like your sample code is leaving out some details (since there are unused function arguments and variables), but what I see here doesn't seem like it should work for what you're trying to do. As a result, it's hard to suggest any other speed-ups, even though there might be other ways to improve your algorithm.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM