简体   繁体   中英

finding gappy sublists within a larger list

Let's say I have a list like this:

 [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

Now I have a list like this:

['she', 'is', 'student']

I want to query the larger list with this one, and return all the lists that contain the words within the query list in the same order. There might be gaps, but the order should be the same. How can I do that? I tried using the in operator but I don't get the desired output.

If all that you care about is that the words appear in order somehwere in the array, you can use a collections.deque and popleft to iterate through the list, and if the deque is emptied, you have found a valid match:

from collections import deque

def find_gappy(arr, m):
  dq = deque(m)
  for word in arr:
    if word == dq[0]:
      dq.popleft()
      if not dq:
        return True
  return False

By comparing each word in arr with the first element of dq , we know that when we find a match, it has been found in the correct order, and then we popleft , so we now are comparing with the next element in the deque .

To filter your initial list, you can use a simple list comprehension that filters based on the result of find_gappy :

matches = ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches)]

# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

You can compare two lists, with a function like this one. The way it works is it loops through your shorter list, and every time it finds the next word in the long list, cuts off the first part of the longer list at that point. If it can't find the word it returns false.

def is_sub_sequence(long_list, short_list):
    for word in short_list:
        if word in long_list:
            i = long_list.index(word)
            long_list = long_list[i+1:]
        else:
            return False
    return True

Now you have a function to tell you if the list is the desired type, you can filter out all the lists you need from the 'list of lists' using a list comprehension like the following:

a = [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]


b = ['she', 'is', 'student']

filtered = [x for x in a if is_sub_sequence(x,b)]

The list filtered will include only the lists of the desired type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM