简体   繁体   English

从列表的子序列中删除重复项

[英]Remove duplicates from subsequences of the list

For part of log parser I need to filter occurrences of baud rate in the log. 对于日志解析器的一部分,我需要过滤日志中出现的波特率。

First I get all occurrences using re.findall , then I'm trying to remove duplicates in subsequences in its result. 首先,我使用re.findall获取所有匹配re.findall ,然后尝试删除其结果re.findall序列中的重复项。 Results are like [10000,10000,10000,10000,0,0,0,10000,10000] , the list can contain several hundreds of values. 结果类似于[10000,10000,10000,10000,0,0,0,10000,10000] ,列表可以包含数百个值。 So the first baud rate was 10000 , then 0 , then again 10000 . 因此,第一个波特率是10000 ,然后是0 ,然后是10000 I need to see how the baud rate changed, so I can't use set , as it will lose information of baud rate switching points. 我需要查看波特率的变化,因此不能使用set ,因为它将丢失波特率切换点的信息。

So, once again input: [10000,10000,10000,10000,0,0,0,10000,10000] 因此,再次输入: [10000,10000,10000,10000,0,0,0,10000,10000]

Desired output: [10000,0,10000] 所需的输出: [10000,0,10000]

What I have made already: 我已经做了什么:

m = [10000,10000,10000,10000,0,0,0,10000,10000] 
n = []
for i,v in enumerate(m):
    if i == 0:
        n.append(v)
        n_index = 0
    else:
        if v != n[n_index]:
            n.append(v)
            n_index = n_index + 1

it works, but it doesn't seem pythonic enough to me. 它有效,但是对我来说似乎还不够pythonic。 Please advise: is there some more efficient way possible, or do I even not need to invent the wheel again? 请告知:是否有一些更有效的方法,还是我甚至不需要再次发明轮子?

Use itertools.groupby : 使用itertools.groupby

>>> rates = [10000,10000,10000,10000,0,0,0,10000,10000]
>>> from itertools import groupby
>>> [e for e, g in groupby(rates)]
[10000, 0, 10000]

Explanation: If no key function is given, then the elements are just grouped by identity, ie groups of consecutive equal elements are collapsed. 说明:如果未提供key功能,则仅按标识对元素进行分组,即,折叠连续的相等元素的组。 The result is an iterator of key-elements and the groups (in this case, just repetitions of the key element). 结果是键元素和组的迭代器(在这种情况下,仅是键元素的重复)。 We need just the keys. 我们只需要钥匙。

Update: Using IPython's %timeit magic command and a list of 100,000 random baud rates, itertools.groupby seems to be about as fast as the "compare to previous element loop" solutions, and a good deal shorter. 更新:使用IPython的%timeit magic命令和100,000个随机波特率列表, itertools.groupby速度似乎与“与以前的元素循环比较”解决方案一样快,而且要短得多。

m = [10000,10000,10000,10000,0,0,0,10000,10000] 
n = []

n.append(m[0])
for i in m[1:]:
    if n[-1] != i:
        n.append(i)
print n
  1. Iterate list m by normal method or by enumerate. 通过常规方法或枚举迭代列表m
  2. Check last element of list n is equal to current element of list m . 检查列表n最后一个元素是否等于列表m当前元素。
  3. If not equal then append current element of list m to list 'n`. 如果不相等,则将列表m当前元素追加到列表'n`。
  4. Used try and expect because first time list n is empty 尝试使用期望,因为第一次列表n为空

code : 代码:

m = [10000,10000,10000,10000,0,0,0,10000,10000] 
n = []
for v in m:
    try:
        if n[-1] != v:
            n.append(v)
    except IndexError:
        n.append(v)

print "Result:-", n 

Output: 输出:

$ python test.py 
Result:- [10000, 0, 10000]

I had to do the same thing, except I needed to save the position of each element. 除了必须保存每个元素的位置外,我还必须做同样的事情。 I am a physicist, so this code is probably crap, but it works. 我是物理学家,所以这段代码可能很烂,但是可以用。 One can remove the aesthetic things like "print" and "press any key to continue..." as those were for debugging. 可以删除诸如调试之类的美观内容,例如“打印”和“按任意键以继续...”。 Probably will adapt one of the answers on this thread. 可能会适应此线程的答案之一。

xvalues=[]
M=[1,1,1,1,1,1,2,3,3,3,3,3,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,8,9,9]
print len(M)
i=0
while i < len(M):
    print "i " + str(i)
    j=i+1
    while j < len(M) and j > i:
        print "j " + str(j)
        if j == len(M)-1: #kills the while loop
            xvalues.append(i) #append the last element index
            print xvalues
            print range(i+1,len(M))
            a=raw_input("Press any key to continue...")
            i=len(M) #the loop killer
            break
        if M[i]!=M[j]:
            xvalues.append(i) #first index in the subsequence of duplicates
            print xvalues
            print range(i+1,len(M))
            a=raw_input("Press any key to continue...")
            i=j #skip to the next subsequence
            break
        if M[i]==M[j]:
            j+=1
            continue

Mnew=[M[i] for i in xvalues]
print xvalues
print Mnew

Final output. 最终输出。 Position in array, then the value of that element. 在数组中的位置,然后是该元素的值。

[0, 6, 7, 12, 14, 19, 22, 27, 28]
[1, 2, 3,  4,  5,  6,  7,  8,  9]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM