[英]Is there a faster way to find repeated patterns in a list?
Python novice here. Python 新手这里。 I have a problem in which I want to find all of the repeated patterns within a list (it is, specifically in my case, a list of integers).
我有一个问题,我想在一个列表中找到所有重复的模式(特别是在我的例子中,它是一个整数列表)。 So, for example, given the list [2,1,4,3,12,8,3,3,4,16,2,9,9,8,3,3,4,1,4,3,4,8,3,3,4] and a min pattern length of 3 the algorithm would find that [8,3,3,4] occurs thrice and [1,4,3] occurs twice (nice also to have the index of all occurrences).
因此,例如,给定列表 [2,1,4,3,12,8,3,3,4,16,2,9,9,8,3,3,4,1,4,3,4 ,8,3,3,4] 和最小模式长度为 3 算法会发现 [8,3,3,4] 出现三次, [1,4,3] 出现两次(索引也很好所有事件)。
I have some code that works, if a little clumsily, but the lists that I want eventually to use the code on may be very large.我有一些代码可以工作,如果有点笨拙,但我最终想要使用代码的列表可能非常大。 I'm not really sure how to work out the operational complexity of my code, but I know that it definitely gets very slow when I am using large lists.
我不太确定如何计算代码的操作复杂性,但我知道当我使用大型列表时它肯定会变得非常慢。
My question is, are there any better algorithms anyone knows for doing this, and/or am I doing this in a very inefficient way?我的问题是,是否有任何人知道这样做的更好算法,和/或我是否以非常低效的方式这样做? Thanks for any help you can give me.
感谢你给与我的帮助。
Here is the code:这是代码:
# Searches list to determine how many times small list is included in big list
def contains(small, big):
counter = 0
# initiating list of indexes. N.B. indexlist gives LAST index of sequence, not first
indexlist = []
for i in range(len(big)-len(small)+1):
for j in range(len(small)):
if big[i+j] != small[j]:
break
else:
counter += 1
indexlist.append(i+j)
if counter > 0:
return counter, indexlist
return False
def findrepeats(sequence, n_letters):
fulldict = {}
# Iterating through all the short-sequences of n letters in the list
for i in range(0, len(sequence) - n_letters):
shortliststr = ""
shortlist = sequence[i:i + n_letters]
for number in shortlist:
shortliststr = shortliststr + "." + str(number)
# If short-sequence is found in full sequence more than once (i.e. itself), add to dict
if contains(shortlist, sequence)[0] > 1 and len(shortlist) == n_letters:
fulldict[shortliststr] = contains(shortlist, sequence)
return fulldict
def findallrepeats(sequence, min_letters, max_letters):
fulldict = {}
# Iterating through all possible n_letters in findrepeats() between given range
for i in range(min_letters, max_letters):
newdict = findrepeats(sequence, i)
fulldict.update(newdict)
return fulldict
With overlapping有重叠
You can use a sliding window of size n = 3 which iterates your sequence and count the number of occurence of this window.您可以使用大小为n = 3 的滑动 window 来迭代您的序列并计算此 window 的出现次数。
Using more_itertools
.使用
more_itertools
。
For instance:例如:
import collections
import more_itertools
sequence = [
2, 1, 4, 3, 12, 8, 3, 3, 4, 16, 2, 9, 9,
8, 3, 3, 4, 1, 4, 3, 4, 8, 3, 3, 4,
]
size = 3
windows = [
tuple(window)
for window in more_itertools.windowed(sequence, size)
]
counter = collections.Counter(windows)
for window, count in counter.items():
if count > 1:
print(window, count)
You get:你得到:
(1, 4, 3) 2
(8, 3, 3) 3
(3, 3, 4) 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.