简体   繁体   中英

Chaining Lists Together In Python With Condition Matching

Premise:

I have lists of two-word phrases that I need to try to daisy chain together.

I have twelve lists with thousands of paired entries in each list, and I need to find all possible chains of the entries through the lists.

The order of the lists relative to each other is fixed -- list3 follows list2 follows list1, etc. I have a solution, but it is decidedly old-school and not very Pythonic -- and it takes forever to run. My last run took 3 hours and 40 min, which is completely unworkable.

So, I'm looking for any solutions that would be more efficient and (hopefully) speed up the process to something that is manageable.

Input Format:

The input data is formatted as 2D lists, like this:

l1 = [ ['SHORT', 'FILM'], ['LEASE', 'BACK'], ['SHELF', 'LIFE'], ['HOLDS', 'FAST'], ... ]
l2 = [ ['BOAT', 'DECK'], ['FAST', 'FOOD'], ['FILM', 'PROP'], ['CHOW', 'LINE'], ... ]
l3 = [ ['FOOD', 'DRIVE'], ['PROP', 'PLANE'], ['GOAL', 'LINES'], ['WRAP', 'PARTY'], ... ]
.
.
.
l12 = [ [ ...

Output:

I need to find all possible chains of words that match the second word of each pair on a list with the first word on the next list, etc, daisy chaining all the way through.

The code I have (shortened to only three lists for brevity) looks like:

l1 = [['SHORT', 'FILM'], ['LEASE', 'BACK'], ['SHELF', 'LIFE'], ['HOLDS', 'FAST']]
l2 = [['BOAT', 'DECK'], ['FAST', 'FOOD'], ['FILM', 'PROP'], ['CHOW', 'LINE']]
l3 = [['FOOD', 'DRIVE'], ['PROP', 'PLANE'], ['GOAL', 'LINES'], ['WRAP', 'PARTY']]

ans = []

for i in range(len(l1)):
        for j in range(len(l2)):
                if  l1[i][1] == l2[j][0]:
                        for k in range(len(l3)):
                                if l2[j][1] == l3[k][0]:
                                        item = [l1[i][0], l1[i][1], l2[j][1], l3[k][1]]
                                        ans.append(item)

print(ans)

Which gives the output of::

[['SHORT', 'FILM', 'PROP', 'PLANE'], ['HOLDS', 'FAST', 'FOOD', 'DRIVE']]

Any suggestions on a more efficient and faster(!) way to do this?

ADDITIONAL INFORMATION AND CONSTRAINTS

I have found out more details on this that provide additional constraints that will change the code. First off, there are not really 12 lists, there are three lists that repeat (in order) 4 times: list1, list2, list3, list1, list2, list3, etc.

Also, the list needs to "circle" back at the end so that the second word of the pair in list12 (same as list3) matches the first word in the pair of list1 (l12[j][1] == l1[k][0]) .

For example, in order for the chain:

HORSE BACK FLIP PHONE HOME RUNS SHORT FILM PROP PLANE RIDE HIGH

to be a valid solution, HIGH HORSE must be on list12/list3.

Also, because the three lists repeat four times and circle around, the four loops

HORSE BACK FLIP PHONE HOME RUNS SHORT FILM PROP PLANE RIDE HIGH
PHONE HOME RUNS SHORT FILM PROP PLANE RIDE HIGH HORSE BACK FLIP
SHORT FILM PROP PLANE RIDE HIGH HORSE BACK FLIP PHONE HOME RUNS
PLANE RIDE HIGH HORSE BACK FLIP PHONE HOME RUNS SHORT FILM PROP

are considered to be the same word loop and the dupes should be removed.

The code snippet I am using is:

for i in range(len(list1)):
  for j in range(len(list2)):
    if  list1[i][1] == list2[j][0]:
      for k in range(len(list3)):
        if list2[j][1] == list3[k][0]:
          for l in range(len(list1)):
            if list3[k][1] == list1[l][0]:
              for m in range(len(list2)):
                if list1[l][1] == list2[m][0]:
                  for n in range(len(list3)):
                    if list2[m][1] == list3[n][0]:
                      for o in range(len(list1)):
                        if list3[n][1] == list1[o][0]:
                          for p in range(len(list2)):
                            if list1[o][1] == list2[p][0]:
                              for q in range(len(list3)):
                                if list2[p][1] == list3[q][0]:
                                  for r in range(len(list1)):
                                    if list3[q][1] == list1[r][0]:
                                      for s in range(len(list2)):
                                        if list1[r][1] == list2[s][0]:
                                          for t in range(len(list3)):
                                            if list2[s][1] == list3[t][0] and list3[t][1] == list1[i][0]:
                                              item = [list1[i][0], list1[i][1], list2[j][1], list3[k][1], list1[l][1], list2[m][1], list3[n][1], list1[o][1], list2[p][1], list3[q][1], list1[r][1], list2[s][1]]
                                              ans.append(item)

Which gives me all the loops, but doesn't remove the duplicates. And... takes hours to complete.

Any suggestions?

You want to build dictionaries from each of your lists so that you can do fast lookups for each connection, not having to iterate over the entire next list looking for a match. This should save you tons of time. I haven't done much with Big O notation, but I think this turns a O(N^2) problem into a O(N) problem.

Here's how to do this for the cut down example you give. It should work just as well (much better, actually) for your 12 long lists:

l1 = [['SHORT', 'FILM'], ['LEASE', 'BACK'], ['SHELF', 'LIFE'], ['HOLDS', 'FAST']]
l2 = [['BOAT', 'DECK'], ['FAST', 'FOOD'], ['FILM', 'PROP'], ['CHOW', 'LINE']]
l3 = [['FOOD', 'DRIVE'], ['PROP', 'PLANE'], ['GOAL', 'LINES'], ['WRAP', 'PARTY']]

# Define a list of our lists so we can iterate over them
lists = [l1, l2, l3]

# For each list except the first one, create an equivalent dictionary where the first value
# in each list entry is a key, and the second value is the corresponding value for that key.
dicts = [{e[0]:e[1] for e in l} for l in lists[1:]]

# Attempt to find a full chain for one of the pairs in list #1
def do_one_chain(pair):
    chain = []
    chain.extend(pair)
    for d in dicts:
        if pair[1] not in d:
            return None
        pair = (pair[1], d[pair[1]])
        chain.append(pair[1])
    return chain

# Iterate over our first list...
ans = []
for pair in l1:
    # Apply our search function to a pair from list #1
    r = do_one_chain(pair)
    # If a chain was found, add it to the answer list
    if r:
        ans.append(r)

# Print the found chains
print(ans)

Result:

[['SHORT', 'FILM', 'PROP', 'PLANE'], ['HOLDS', 'FAST', 'FOOD', 'DRIVE']]

What's nice about this is that the solution is not only much more efficient, it's much cleaner and compact as well. None of the code changes as you add more levels. You just have to add the new levels to the lists list. I know this is what you were looking for, dreading having to stretch those nested if statements out to 12 levels worth, right?

Please report back how long this algorithm takes to run on your full dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM