I currently have 6 separate for loops which iterate over a list of numbers looking to match specific sequences of numbers within larger sequences, and replace them like this:
[...0,1,0...] => [...0,0,0...]
[...0,1,1,0...] => [...0,0,0,0...]
[...0,1,1,1,0...] => [...0,0,0,0,0...]
And their inverse:
[...1,0,1...] => [...1,1,1...]
[...1,0,0,1...] => [...1,1,1,1...]
[...1,0,0,0,1...] => [...1,1,1,1,1...]
My existing code is like this:
for i in range(len(output_array)-2):
if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 0:
output_array[i+1] = 0
for i in range(len(output_array)-3):
if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 1 and output_array[i+3] == 0:
output_array[i+1], output_array[i+2] = 0
In total I'm iterating over the same output_array 6 times, using brute force checking. Is there a faster method?
# I would create a map between the string searched and the new one.
patterns = {}
patterns['010'] = '000'
patterns['0110'] = '0000'
patterns['01110'] = '00000'
# I would loop over the lists
lists = [[0,1,0,0,1,1,0,0,1,1,1,0]]
for lista in lists:
# i would join the list elements as a string
string_list = ''.join(map(str,lista))
# we loop over the patterns
for pattern,value in patterns.items():
# if a pattern is detected, we replace it
string_list = string_list.replace(pattern, value)
lista = list(string_list)
print lista
While this question related to the questions Here and Here , the question from OP relates to fast searching of multiple sequences at once. While the accepted answer works well, we may not want to loop through all the search sequences for every sub-iteration of the base sequence.
Below is an algo which checks for a sequence of i ints only if the sequence of (i-1) ints is present in the base sequence
# This is the driver function which takes in a) the search sequences and
# replacements as a dictionary and b) the full sequence list in which to search
def findSeqswithinSeq(searchSequences,baseSequence):
seqkeys = [[int(i) for i in elem.split(",")] for elem in searchSequences]
maxlen = max([len(elem) for elem in seqkeys])
decisiontree = getdecisiontree(seqkeys)
i = 0
while i < len(baseSequence):
(increment,replacement) = get_increment_replacement(decisiontree,baseSequence[i:i+maxlen])
if replacement != -1:
baseSequence[i:i+len(replacement)] = searchSequences[",".join(map(str,replacement))]
i +=increment
return baseSequence
#the following function gives the dictionary of intermediate sequences allowed
def getdecisiontree(searchsequences):
dtree = {}
for elem in searchsequences:
for i in range(len(elem)):
if i+1 == len(elem):
dtree[",".join(map(str,elem[:i+1]))] = True
else:
dtree[",".join(map(str,elem[:i+1]))] = False
return dtree
# the following is the function does most of the work giving us a) how many
# positions we can skip in the search and b)whether the search seq was found
def get_increment_replacement(decisiontree,sequence):
if str(sequence[0]) not in decisiontree:
return (1,-1)
for i in range(1,len(sequence)):
key = ",".join(map(str,sequence[:i+1]))
if key not in decisiontree:
return (1,-1)
elif decisiontree[key] == True:
key = [int(i) for i in key.split(",")]
return (len(key),key)
return 1, -1
You can test the above code with this snippet:
if __name__ == "__main__":
inputlist = [5,4,0,1,1,1,0,2,0,1,0,99,15,1,0,1]
patternsandrepls = {'0,1,0':[0,0,0],
'0,1,1,0':[0,0,0,0],
'0,1,1,1,0':[0,0,0,0,0],
'1,0,1':[1,1,1],
'1,0,0,1':[1,1,1,1],
'1,0,0,0,1':[1,1,1,1,1]}
print(findSeqswithinSeq(patternsandrepls,inputlist))
The proposed solution represents the sequences to be searched as a decision tree.
Due to skipping the many of the search points, we should be able to do better than O(m*n) with this method (where m is number of search sequences and n is length of base sequence.
EDIT: Changed answer based on more clarity in edited question.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.