简体   繁体   中英

Search and replace multiple specific sequences of elements in Python list/array

I currently have 6 separate for loops which iterate over a list of numbers looking to match specific sequences of numbers within larger sequences, and replace them like this:

[...0,1,0...] => [...0,0,0...]
[...0,1,1,0...] => [...0,0,0,0...]
[...0,1,1,1,0...] => [...0,0,0,0,0...]

And their inverse:

[...1,0,1...] => [...1,1,1...]
[...1,0,0,1...] => [...1,1,1,1...]
[...1,0,0,0,1...] => [...1,1,1,1,1...]

My existing code is like this:

for i in range(len(output_array)-2):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 0:
        output_array[i+1] = 0

for i in range(len(output_array)-3):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 1 and output_array[i+3] == 0:
        output_array[i+1], output_array[i+2] = 0

In total I'm iterating over the same output_array 6 times, using brute force checking. Is there a faster method?

# I would create a map between the string searched and the new one.

patterns = {}
patterns['010'] = '000'
patterns['0110'] = '0000'
patterns['01110'] = '00000'

# I would loop over the lists

lists = [[0,1,0,0,1,1,0,0,1,1,1,0]]

for lista in lists:

    # i would join the list elements as a string
    string_list = ''.join(map(str,lista))

    # we loop over the patterns
    for pattern,value in patterns.items():

        # if a pattern is detected, we replace it
        string_list = string_list.replace(pattern, value)
        lista = list(string_list)
    print lista

While this question related to the questions Here and Here , the question from OP relates to fast searching of multiple sequences at once. While the accepted answer works well, we may not want to loop through all the search sequences for every sub-iteration of the base sequence.

Below is an algo which checks for a sequence of i ints only if the sequence of (i-1) ints is present in the base sequence

# This is the driver function which takes in a) the search sequences and 
# replacements as a dictionary and b) the full sequence list in which to search 

def findSeqswithinSeq(searchSequences,baseSequence):
    seqkeys = [[int(i) for i in elem.split(",")] for elem in searchSequences]
    maxlen = max([len(elem) for elem in seqkeys])
    decisiontree = getdecisiontree(seqkeys)
    i = 0
    while i < len(baseSequence):
        (increment,replacement) = get_increment_replacement(decisiontree,baseSequence[i:i+maxlen])
        if replacement != -1:
            baseSequence[i:i+len(replacement)] = searchSequences[",".join(map(str,replacement))]
        i +=increment
    return  baseSequence

#the following function gives the dictionary of intermediate sequences allowed
def getdecisiontree(searchsequences):
    dtree = {}
    for elem in searchsequences:
        for i in range(len(elem)):
            if i+1 == len(elem):
                dtree[",".join(map(str,elem[:i+1]))] = True
            else:
                dtree[",".join(map(str,elem[:i+1]))] = False
    return dtree

# the following is the function does most of the work giving us a) how many
# positions we can skip in the search and b)whether the search seq was found
def get_increment_replacement(decisiontree,sequence):
    if str(sequence[0]) not in decisiontree:
        return (1,-1)
    for i in range(1,len(sequence)):
        key = ",".join(map(str,sequence[:i+1]))
        if key not in decisiontree:
            return (1,-1)
        elif decisiontree[key] == True:
            key = [int(i) for i in key.split(",")]
            return (len(key),key)
    return 1, -1

You can test the above code with this snippet:

if __name__ == "__main__":
    inputlist = [5,4,0,1,1,1,0,2,0,1,0,99,15,1,0,1]
    patternsandrepls = {'0,1,0':[0,0,0],
                        '0,1,1,0':[0,0,0,0],
                        '0,1,1,1,0':[0,0,0,0,0],
                        '1,0,1':[1,1,1],
                        '1,0,0,1':[1,1,1,1],
                        '1,0,0,0,1':[1,1,1,1,1]}

    print(findSeqswithinSeq(patternsandrepls,inputlist))

The proposed solution represents the sequences to be searched as a decision tree.

Due to skipping the many of the search points, we should be able to do better than O(m*n) with this method (where m is number of search sequences and n is length of base sequence.

EDIT: Changed answer based on more clarity in edited question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM