简体   繁体   English

从Python中的列表中删除重复的整数序列

[英]Remove repeated sequence of integers from list in Python

I have a list of integers that I need to remove duplicate sequences from, and the logic is doing my head in. 我有一个整数列表,我需要删除重复的序列,逻辑正在我的头脑中。

I've been trying to modify this to what I need, however that only returns one number pertaining to the length of the repeating sequence, and it only counts from the starting integer. 我一直在试图修改我所需要的,但是,只有返回属于重复序列的长度一个号码,它只能从开始计数的整数。

This is as far as I've gotten so far: 到目前为止,这是我所得到的:

def findRep(rmRepList):
    #Array to hold [starting position, length] of repeating sequences
    repList = []
    #For each industry listed
    for industry in rmRepList:
        #Maximum starting position
        maxStartPos = len(industry)-2
        #For each possible starting point of repetition
        for start in range(1,maxStartPos):
            #Limit on how long the repetition can be
            maxLen = math.ceil((len(industry)-start)/2)

            #For each possible length (2 because already canceled out repeating resources in genAllLoop)
            for i in range(2,maxLen):
                #If the next 'i' integers repeat
                if industry[start:i+start] is industry[i+start:2*i+start]:
                    repList = [start,i]
                    industry = rmRep(repList, industry)

                #If reached end of list
                if 2*i+start+1 == len(industry):
                    #End loop
                    break

def rmRep(rmProp, loop):
    #Sequence of resources to drop
    rmSeq = [loop[rmProp[0]:rmProp[0]+rmProp[1]]]
    #Debugging statement
    print(rmSeq)
    loop.remove(rmSeq)
    return(loop)name = username_entry.get()

rmRepList is a list of lists, holding each list I need to analyse. rmRepList是一个列表列表,包含我需要分析的每个列表。

So for example, if given a list 例如,如果给出一个列表

rmRepList = [[0,1,2,1,2,1,0],[0,1,2,1,2,1,2,3,4,5,3,4,5,6,0]]

I would need it to return [[0,1,2,1,0], [0,1,2,3,4,5,6,0]] 我需要它返回[[0,1,2,1,0], [0,1,2,3,4,5,6,0]]

Right now, the code isn't even reaching the rmRep subroutine. 现在,代码甚至没有到达rmRep子例程。 I have a horrible feeling I'm going about this all wrong. 我有一种可怕的感觉,我说这一切都错了。 I don't like that I need so many loops in the code, especially as the actual lists I have to analyse are hundreds of digits long. 我不喜欢我在代码中需要这么多循环,特别是因为我必须分析的实际列表长达数百个数字。 Is there a simpler way to do this? 有更简单的方法吗?


EDIT: If it helps, I can guarantee that the lists will not repeat one integer over and over (eg [0,0,0,1,0] won't happen). 编辑:如果它有帮助,我可以保证列表不会反复重复一个整数(例如[0,0,0,1,0]不会发生)。

Also, the first and last number in the lists will always be the same. 此外,列表中的第一个和最后一个数字将始终相同。

Part of the answer : detect your repeated sequences. 部分答案:检测重复的序列。

listA = [0,1,2,1,2,1,2,3,4,5,3,4,5,6,0]
listB = [0,1,2,1,2,1,0]

def get_repeated_seq(seq, start, length):
    ref = seq[start:start+length]
    #print("Ref", ref)
    for pos in range(start+length, len(seq)-length):
        compare = seq[pos:pos+length]
        #print("Pos", pos, compare)
        if compare == ref:
            print("Found", ref, "at", pos)
            return pos
    return False

def get_repeated_seqs(seq):
    for size in reversed(range(2, len(seq)/2)):
        for pos in range(0, len(seq)-size):
            print("Check rep starting at pos %s for size %s" % (pos, size))
            get_repeated_seq(seq, pos, size)

print(get_repeated_seqs(listA))

Then you can remove them according to your removal strategy (largests ? smallests ?) 然后你可以根据你的删除策略删除它们(bigsts?smallests?)

EDIT : to be clear that it works (and adding some removal by the same time) 编辑 :要明确它的工作原理(并在同一时间添加一些删除)

listA = [0,1,2,1,2,1,2,3,4,5,2,1,3,4,5,2,1,6,0]
listB = [0,1,2,1,2,1,0]

def get_repeated_seq(seq, start, length):
    ref = seq[start:start+length]
    #print("Ref", ref)
    for pos in range(start+length, len(seq)-length):
        compare = seq[pos:pos+length]
        #print("Pos", pos, compare)
        if compare == ref:
            #print("Found", ref, "at", pos)
            return pos, length
    return False

def get_repeated_seqs(seq):
    reps = []
    for size in reversed(range(2, len(seq)/2)):
        for pos in range(0, len(seq)-size):
            #print("Check rep starting at pos %s for size %s" % (pos, size))
            rep = get_repeated_seq(seq, pos, size)
            if rep:
                reps.append(rep)
    return reps

def remove_repeated_seqs(seq, reps):
    # need to backup seq ?
    for rep in reps:
        overlaps = False
        for pos in range(rep[0], rep[0]+rep[1]):
            if seq[pos] == "*":
                overlaps = True
        if not overlaps:
            for pos in range(rep[0], rep[0]+rep[1]):
                seq[pos] = "*"
    out = []
    for item in seq:
        if item != "*":
            out.append(item)
    return out


reps = get_repeated_seqs(listB)
rem = remove_repeated_seqs(listB, reps)
#print(rem)
print(rem==[0,1,2,1,0])

reps = get_repeated_seqs(listA)
rem = remove_repeated_seqs(listA, reps)
#print(rem)
print(rem==[0,1,2,3,4,5,6,0])

outputs True and True :) 输出真实和真实:)

EDIT2: no -1 to go up to the end of a list in a for range loop.. EDIT2:没有-1到达范围循环中列表的末尾..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM