简体   繁体   English

不使用列表突变删除重复项

[英]Remove duplicates without using list mutation

I am trying to remove adjacent duplicates from a list without using list mutations like del or remove.我试图从列表中删除相邻的重复项,而不使用像 del 或 remove 这样的列表突变。 Below is the code I tried:下面是我试过的代码:

def remove_dups(L):   
    L = [x for x in range(0,len(L)) if L[x] != L[x-1]]
    

    return L

print(remove_dups([1,2,2,3,3,3,4,5,1,1,1]))

This outputs:这输出:

[1, 3, 6, 7, 8]

Can anyone explain me how this output occurred?谁能解释一下这个输出是如何发生的? I want to understand the flow but I wasn't able to do it even with debugging in VS code.我想了解流程,但即使在 VS 代码中进行调试,我也无法做到。

Input:输入:

[1,2,2,3,3,3,4,5,1,1,1]

Expected output:预期输出:

[1,2,3,4,5,1]

I'll replace the variables to make this more readable我将替换变量以使其更具可读性

def remove_dups(L):   
    L = [x for x in range(0,len(L)) if L[x] != L[x-1]]

becomes:变成:

def remove_dups(lst):   
   return [index for index in range(len(lst)) if lst[index] != lst[index-1]]

You can see, instead of looping over the items of the list it is instead looping over the indices of the array comparing the value at one index lst[index] to the value at the previous index lst[index-1] and only migrating/copying the value if they don't match您可以看到,不是循环遍历列表的项目,而是循环遍历数组的索引,将一个索引lst[index]处的值与前一个索引lst[index-1]处的值进行比较,并且仅迁移/如果它们不匹配,则复制值

The two main issues are:两个主要问题是:

  1. the first index it is compared to is -1 which is the last item of the list (compared to the first)它与之比较的第一个索引是-1 ,它是列表的最后一项(与第一个相比)
  2. this is actually returning the indices of the non-duplicated items.这实际上是返回非重复项的索引。

To make this work, I'd use the enumerate function which returns the item and it's index as follows:为了完成这项工作,我将使用 enumerate 函数返回项目及其索引,如下所示:

def remove_dups(lst):   
   return [item for index, item in enumerate(lst[:-1]) if item != lst[index+1]] + [lst[-1]]

Here what I'm doing is looping through all of the items except for the last one [:-1] and checking if the item matches the next item, only adding it if it doesn't这里我正在做的是循环遍历除最后一个[:-1]之外的所有项目并检查该项目是否与下一个项目匹配,如果不匹配则仅添加它

Finally, because the last value isn't read we append it to the output + [lst[-1]] .最后,因为最后一个值没有被读取,我们将它附加到输出+ [lst[-1]]

This is a job for itertools.groupby :这是itertools.groupby的工作:

from itertools import groupby

def remove_dups(L):
    return [k for k,g in groupby(L)]

L2 = remove_dups([1,2,2,3,3,3,4,5,1,1,1])

Output: [1, 2, 3, 4, 5, 1]输出: [1, 2, 3, 4, 5, 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM