简体   繁体   English

在更大的列表中查找空洞的子列表

[英]finding gappy sublists within a larger list

Let's say I have a list like this: 假设我有一个这样的列表:

 [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

Now I have a list like this: 现在我有一个像这样的列表:

['she', 'is', 'student']

I want to query the larger list with this one, and return all the lists that contain the words within the query list in the same order. 我想用这个查询更大的列表,并以相同顺序返回所有包含查询列表中单词的列表。 There might be gaps, but the order should be the same. 可能存在差距,但顺序应相同。 How can I do that? 我怎样才能做到这一点? I tried using the in operator but I don't get the desired output. 我尝试使用in运算符,但未获得所需的输出。

If all that you care about is that the words appear in order somehwere in the array, you can use a collections.deque and popleft to iterate through the list, and if the deque is emptied, you have found a valid match: 如果您只关心单词在数组中的出现顺序,则可以使用collections.dequepopleft遍历列表,如果将deque为空,则找到有效的匹配项:

from collections import deque

def find_gappy(arr, m):
  dq = deque(m)
  for word in arr:
    if word == dq[0]:
      dq.popleft()
      if not dq:
        return True
  return False

By comparing each word in arr with the first element of dq , we know that when we find a match, it has been found in the correct order, and then we popleft , so we now are comparing with the next element in the deque . 通过将arr每个worddq的第一个元素进行比较,我们知道,当找到匹配项时,就以正确的顺序找到了匹配项,然后我们popleft ,因此现在我们与deque的下一个元素进行比较。

To filter your initial list, you can use a simple list comprehension that filters based on the result of find_gappy : 要过滤您的初始列表,您可以使用一个简单的列表理解功能,该功能基于find_gappy的结果进行find_gappy

matches = ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches)]

# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

You can compare two lists, with a function like this one. 您可以比较两个列表,并具有类似这样的功能。 The way it works is it loops through your shorter list, and every time it finds the next word in the long list, cuts off the first part of the longer list at that point. 它的工作方式是循环遍历您的较短列表,并且每次在长列表中找到下一个单词时,都将切断较长列表的第一部分。 If it can't find the word it returns false. 如果找不到该单词,则返回false。

def is_sub_sequence(long_list, short_list):
    for word in short_list:
        if word in long_list:
            i = long_list.index(word)
            long_list = long_list[i+1:]
        else:
            return False
    return True

Now you have a function to tell you if the list is the desired type, you can filter out all the lists you need from the 'list of lists' using a list comprehension like the following: 现在,您可以使用一个函数来告诉您列表是否为所需类型,您可以使用如下列表理解功能从“列表列表”中过滤出所需的所有列表:

a = [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]


b = ['she', 'is', 'student']

filtered = [x for x in a if is_sub_sequence(x,b)]

The list filtered will include only the lists of the desired type. filtered的列表将仅包括所需类型的列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM