优雅地在列表中查找子列表

Question

给定一个包含噪声包围的已知模式的列表，是否有一种优雅的方法来获取与该模式相等的所有项目。 请参阅下面的我的粗代码。

list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
known_pattern = [1,2,3,4]
res = []


for i in list_with_noise:
    for j in known_pattern:
        if i == j:
            res.append(i)
            continue

print res

我们会得到2, 1, 2, 3, 4, 2, 1, 2, 3, 4, 1, 2, 3, 4, 4, 3

奖励：如果不存在完整模式，则避免附加 i（即，允许 1,2,3,4 但不允许 1,2,3）

例子：

find_sublists_in_list([7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5],[1,2,3,4])

[1,2,3,4],[1,2,3,4],[1,2,3,4]


find_sublists_in_list([7,2,1,2,3,2,1,2,3,6,9,9,1,2,3,4,7,4,3,1,2,6],[1,2,3,4])

[1,2,3],[1,2,3],[1,2,3]

列表包含命名元组。

Answer 1

我知道这个问题已经 5 个月大并且已经“被接受”了，但是在谷歌上搜索一个非常相似的问题让我想到了这个问题，所有的答案似乎都有几个相当重要的问题，而且我很无聊，想试试我的手在一个 SO 答案中，所以我只是要说出我发现的东西。

据我所知，问题的第一部分非常简单：只需返回原始列表，其中过滤掉所有不在“模式”中的元素。 按照这种想法，我想到的第一个代码使用了 filter() 函数：

def subfinder(mylist, pattern):
    return list(filter(lambda x: x in pattern, mylist))

我会说这个解决方案肯定比原始解决方案更简洁，但它并没有更快，或者至少不是明显，如果没有很好的理由使用它们，我会尽量避免使用 lambda 表达式。 事实上，我能想出的最佳解决方案涉及一个简单的列表理解：

def subfinder(mylist, pattern):
    pattern = set(pattern)
    return [x for x in mylist if x in pattern]

这个解决方案比原始解决方案更优雅，速度也明显更快：理解速度比原始解决方案快约 120%，同时将模式转换为一组第一个颠簸，在我的测试中速度高达 320%。

现在奖励：我会直接跳进去，我的解决方案如下：

def subfinder(mylist, pattern):
    matches = []
    for i in range(len(mylist)):
        if mylist[i] == pattern[0] and mylist[i:i+len(pattern)] == pattern:
            matches.append(pattern)
    return matches

这是 Steven Rumbalski 的“低效单行”的一种变体，通过添加“mylist[i] == pattern[0]”检查并感谢 python 的短路评估，比原始语句要快得多和 itertools 版本（据我所知，以及其他所有提供的解决方案），它甚至支持重叠模式。 所以你去。

Answer 2

这将获得您问题的“奖金”部分：

pattern = [1, 2, 3, 4]
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
cursor = 0
found = []
for i in search_list:
    if i == pattern[cursor]:
        cursor += 1
        if cursor == len(pattern):
            found.append(pattern)
            cursor = 0
    else:
        cursor = 0

对于非奖金：

pattern = [1, 2, 3, 4]
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
cursor = 0
found = []
for i in search_list:
    if i != pattern[cursor]:
        if cursor > 0:
            found.append(pattern[:cursor])
        cursor = 0
    else:
        cursor += 1

最后，这个处理重叠：

def find_matches(pattern_list, search_list):
    cursor_list = []
    found = []
    for element in search_list:
        cursors_to_kill = []
        for cursor_index in range(len(cursor_list)):
            if element == pattern_list[cursor_list[cursor_index]]:
                cursor_list[cursor_index] += 1
                if cursor_list[cursor_index] == len(pattern_list):
                    found.append(pattern_list)
                    cursors_to_kill.append(cursor_index)
            else:
                cursors_to_kill.append(cursor_index)
        cursors_to_kill.reverse()
        for cursor_index in cursors_to_kill:
            cursor_list.pop(cursor_index)
        if element == pattern_list[0]:
            cursor_list.append(1)
    return found

Answer 3

基于迭代器的方法仍然基于朴素算法，但尝试使用.index()进行尽可能多的隐式循环：

def find_pivot(seq, subseq):
    n = len(seq)
    m = len(subseq)
    stop = n - m + 1
    if n > 0:
        item = subseq[0]
        i = 0
        try:
            while i < stop:
                i = seq.index(item, i)
                if seq[i:i + m] == subseq:
                    yield i
                i += 1
        except ValueError:
            return

与具有不同程度显式循环的其他几种方法相比：

def find_loop(seq, subseq):
    n = len(seq)
    m = len(subseq)
    for i in range(n - m + 1):
        if all(seq[i + j] == subseq[j] for j in (range(m))):
            yield i

def find_slice(seq, subseq):
    n = len(seq)
    m = len(subseq)
    for i in range(n - m + 1):
        if seq[i:i + m] == subseq:
            yield i

def find_mix(seq, subseq):
    n = len(seq)
    m = len(subseq)
    for i in range(n - m + 1):
        if seq[i] == subseq[0] and seq[i:i + m] == subseq:
            yield i

一个人会得到：

a = list(range(10))
print(a)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

b = list(range(5, 10))
print(b)
# [5, 6, 7, 8, 9]

funcs = find_pivot, find_loop, find_slice, find_mix, 
for func in funcs:
    print()
    print(func.__name__)
    print(list(func(a * 10, b)))
    aa = a * 100
    %timeit list(func(aa, b))
    random.shuffle(aa)
    %timeit list(func(aa, b))

# find_pivot
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 49.6 µs per loop
# 10000 loops, best of 3: 50.1 µs per loop

# find_loop
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 1000 loops, best of 3: 712 µs per loop
# 1000 loops, best of 3: 680 µs per loop

# find_slice
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 162 µs per loop
# 10000 loops, best of 3: 162 µs per loop

# find_mix
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 82.2 µs per loop
# 10000 loops, best of 3: 83.9 µs per loop

请注意，这比使用测试输入的当前接受的答案快约 30%。

Answer 4

list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
string_withNoise = "".join(str(i) for i in list_with_noise)
known_pattern = [1,2,3,4]
string_pattern = "".join(str(i) for i in known_pattern)
string_withNoise.count(string_pattern)

Answer 5

鉴于：

a_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
pat = [1,2,3,4]

这是一个低效的单线：

res = [pat for i in range(len(a_list)) if a_list[i:i+len(pat)] == pat]

这是一个更高效的 itertools 版本：

from itertools import izip_longest, islice

res = []
i = 0  

while True:
    try:
        i = a_list.index(pat[0], i)
    except ValueError:
        break
    if all(a==b for (a,b) in izip_longest(pat, islice(a_list, i, i+len(pat)))):
        res.append(pat)
        i += len(pat)
    i += 1

Answer 6

一个惯用的、可组合的解决方案。

首先，我们需要借用一个itertools配方， consume （其消耗和丢弃给定数量从一个迭代元素然后，我们取。 itertools用于配方pairwise ，并将其扩展到一个nwise使用功能consume ：

import itertools

def nwise(iterable, size=2):
    its = itertools.tee(iterable, size)
    for i, it in enumerate(its):
        consume(it, i)  # Discards i elements from it
    return zip(*its)

现在我们有了这个，解决奖金问题真的很容易：

def find_sublists_in_list(biglist, searchlist):
    searchtup = tuple(searchlist)
    return [list(subtup) for subtup in nwise(biglist, len(searchlist)) if subtup == searchtup]

    # Or for more obscure but faster one-liner:
    return map(list, filter(tuple(searchlist).__eq__, nwise(biglist, len(searchlist))))

同样，针对主要问题的更简洁、更快速（如果不那么漂亮）的解决方案替换为：

def subfinder(mylist, pattern):
    pattern = set(pattern)
    return [x for x in mylist if x in pattern]

和：

def subfinder(mylist, pattern):
    # Wrap filter call in list() if on Python 3 and you need a list, not a generator
    return filter(set(pattern).__contains__, mylist)

其行为方式相同，但无需将临时set存储为名称，并将所有过滤工作推送到 C。

Answer 7

def sublist_in_list(sub, lis):
    return str(sub).strip('[]') in str(lis).strip('[]')

优雅地在列表中查找子列表

问题描述

7 个解决方案

解决方案1
50 已采纳 2012-09-25 05:37:35

解决方案2
5 2012-04-11 13:36:17

解决方案3
2 2020-03-23 18:26:17

解决方案4
1 2012-04-11 13:37:16

解决方案5
0 2012-04-11 14:25:03

解决方案6
0 2018-02-03 00:26:56

解决方案7
0 2019-12-02 08:52:17

优雅地在列表中查找子列表

问题描述

7 个解决方案

解决方案1 50 已采纳 2012-09-25 05:37:35

解决方案2 5 2012-04-11 13:36:17

解决方案3 2 2020-03-23 18:26:17

解决方案4 1 2012-04-11 13:37:16

解决方案5 0 2012-04-11 14:25:03

解决方案6 0 2018-02-03 00:26:56

解决方案7 0 2019-12-02 08:52:17

解决方案1
50 已采纳 2012-09-25 05:37:35

解决方案2
5 2012-04-11 13:36:17

解决方案3
2 2020-03-23 18:26:17

解决方案4
1 2012-04-11 13:37:16

解决方案5
0 2012-04-11 14:25:03

解决方案6
0 2018-02-03 00:26:56

解决方案7
0 2019-12-02 08:52:17