[英]elegant find sub-list in list
給定一個包含噪聲包圍的已知模式的列表,是否有一種優雅的方法來獲取與該模式相等的所有項目。 請參閱下面的我的粗代碼。
list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
known_pattern = [1,2,3,4]
res = []
for i in list_with_noise:
for j in known_pattern:
if i == j:
res.append(i)
continue
print res
我們會得到2, 1, 2, 3, 4, 2, 1, 2, 3, 4, 1, 2, 3, 4, 4, 3
獎勵:如果不存在完整模式,則避免附加 i(即,允許 1,2,3,4 但不允許 1,2,3)
例子:
find_sublists_in_list([7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5],[1,2,3,4])
[1,2,3,4],[1,2,3,4],[1,2,3,4]
find_sublists_in_list([7,2,1,2,3,2,1,2,3,6,9,9,1,2,3,4,7,4,3,1,2,6],[1,2,3,4])
[1,2,3],[1,2,3],[1,2,3]
列表包含命名元組。
我知道這個問題已經 5 個月大並且已經“被接受”了,但是在谷歌上搜索一個非常相似的問題讓我想到了這個問題,所有的答案似乎都有幾個相當重要的問題,而且我很無聊,想試試我的手在一個 SO 答案中,所以我只是要說出我發現的東西。
據我所知,問題的第一部分非常簡單:只需返回原始列表,其中過濾掉所有不在“模式”中的元素。 按照這種想法,我想到的第一個代碼使用了 filter() 函數:
def subfinder(mylist, pattern):
return list(filter(lambda x: x in pattern, mylist))
我會說這個解決方案肯定比原始解決方案更簡潔,但它並沒有更快,或者至少不是明顯,如果沒有很好的理由使用它們,我會盡量避免使用 lambda 表達式。 事實上,我能想出的最佳解決方案涉及一個簡單的列表理解:
def subfinder(mylist, pattern):
pattern = set(pattern)
return [x for x in mylist if x in pattern]
這個解決方案比原始解決方案更優雅,速度也明顯更快:理解速度比原始解決方案快約 120%,同時將模式轉換為一組第一個顛簸,在我的測試中速度高達 320%。
現在獎勵:我會直接跳進去,我的解決方案如下:
def subfinder(mylist, pattern):
matches = []
for i in range(len(mylist)):
if mylist[i] == pattern[0] and mylist[i:i+len(pattern)] == pattern:
matches.append(pattern)
return matches
這是 Steven Rumbalski 的“低效單行”的一種變體,通過添加“mylist[i] == pattern[0]”檢查並感謝 python 的短路評估,比原始語句要快得多和 itertools 版本(據我所知,以及其他所有提供的解決方案) ,它甚至支持重疊模式。 所以你去。
這將獲得您問題的“獎金”部分:
pattern = [1, 2, 3, 4]
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
cursor = 0
found = []
for i in search_list:
if i == pattern[cursor]:
cursor += 1
if cursor == len(pattern):
found.append(pattern)
cursor = 0
else:
cursor = 0
對於非獎金:
pattern = [1, 2, 3, 4]
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
cursor = 0
found = []
for i in search_list:
if i != pattern[cursor]:
if cursor > 0:
found.append(pattern[:cursor])
cursor = 0
else:
cursor += 1
最后,這個處理重疊:
def find_matches(pattern_list, search_list):
cursor_list = []
found = []
for element in search_list:
cursors_to_kill = []
for cursor_index in range(len(cursor_list)):
if element == pattern_list[cursor_list[cursor_index]]:
cursor_list[cursor_index] += 1
if cursor_list[cursor_index] == len(pattern_list):
found.append(pattern_list)
cursors_to_kill.append(cursor_index)
else:
cursors_to_kill.append(cursor_index)
cursors_to_kill.reverse()
for cursor_index in cursors_to_kill:
cursor_list.pop(cursor_index)
if element == pattern_list[0]:
cursor_list.append(1)
return found
基於迭代器的方法仍然基於朴素算法,但嘗試使用.index()
進行盡可能多的隱式循環:
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
stop = n - m + 1
if n > 0:
item = subseq[0]
i = 0
try:
while i < stop:
i = seq.index(item, i)
if seq[i:i + m] == subseq:
yield i
i += 1
except ValueError:
return
與具有不同程度顯式循環的其他幾種方法相比:
def find_loop(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if all(seq[i + j] == subseq[j] for j in (range(m))):
yield i
def find_slice(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i:i + m] == subseq:
yield i
def find_mix(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i:i + m] == subseq:
yield i
一個人會得到:
a = list(range(10))
print(a)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = list(range(5, 10))
print(b)
# [5, 6, 7, 8, 9]
funcs = find_pivot, find_loop, find_slice, find_mix,
for func in funcs:
print()
print(func.__name__)
print(list(func(a * 10, b)))
aa = a * 100
%timeit list(func(aa, b))
random.shuffle(aa)
%timeit list(func(aa, b))
# find_pivot
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 49.6 µs per loop
# 10000 loops, best of 3: 50.1 µs per loop
# find_loop
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 1000 loops, best of 3: 712 µs per loop
# 1000 loops, best of 3: 680 µs per loop
# find_slice
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 162 µs per loop
# 10000 loops, best of 3: 162 µs per loop
# find_mix
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95]
# 10000 loops, best of 3: 82.2 µs per loop
# 10000 loops, best of 3: 83.9 µs per loop
請注意,這比使用測試輸入的當前接受的答案快約 30%。
list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
string_withNoise = "".join(str(i) for i in list_with_noise)
known_pattern = [1,2,3,4]
string_pattern = "".join(str(i) for i in known_pattern)
string_withNoise.count(string_pattern)
鑒於:
a_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5]
pat = [1,2,3,4]
這是一個低效的單線:
res = [pat for i in range(len(a_list)) if a_list[i:i+len(pat)] == pat]
這是一個更高效的 itertools 版本:
from itertools import izip_longest, islice
res = []
i = 0
while True:
try:
i = a_list.index(pat[0], i)
except ValueError:
break
if all(a==b for (a,b) in izip_longest(pat, islice(a_list, i, i+len(pat)))):
res.append(pat)
i += len(pat)
i += 1
一個慣用的、可組合的解決方案。
首先,我們需要借用一個itertools
配方, consume
(其消耗和丟棄給定數量從一個迭代元素然后,我們取。 itertools
用於配方pairwise
,並將其擴展到一個nwise
使用功能consume
:
import itertools
def nwise(iterable, size=2):
its = itertools.tee(iterable, size)
for i, it in enumerate(its):
consume(it, i) # Discards i elements from it
return zip(*its)
現在我們有了這個,解決獎金問題真的很容易:
def find_sublists_in_list(biglist, searchlist):
searchtup = tuple(searchlist)
return [list(subtup) for subtup in nwise(biglist, len(searchlist)) if subtup == searchtup]
# Or for more obscure but faster one-liner:
return map(list, filter(tuple(searchlist).__eq__, nwise(biglist, len(searchlist))))
同樣,針對主要問題的更簡潔、更快速(如果不那么漂亮)的解決方案替換為:
def subfinder(mylist, pattern):
pattern = set(pattern)
return [x for x in mylist if x in pattern]
和:
def subfinder(mylist, pattern):
# Wrap filter call in list() if on Python 3 and you need a list, not a generator
return filter(set(pattern).__contains__, mylist)
其行為方式相同,但無需將臨時set
存儲為名稱,並將所有過濾工作推送到 C。
def sublist_in_list(sub, lis):
return str(sub).strip('[]') in str(lis).strip('[]')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.