简体   繁体   English

Python:从列表中删除特定项目的重复项

[英]Python: Remove duplicates for a specific item from list

I have a list of item, where I want to remove the occurrence of any duplicates for one item, but keep any duplicates for the rest. 我有一个项目列表,我想删除一个项目的任何重复项的出现,但保留其余的重复项。 Ie I start with the following list 即我从以下列表开始

mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]

I want to remove any duplicates of 0 but keep the duplicates of 1 and 9 . 我想删除任何重复的0但保留重复的19 My current solution is the following: 我目前的解决方案如下:

mylist = [i for i in mylist if i != 0]
mylist.add(0)

Is there a nice way of keeping one occurrence of 0 besides the following? 除了以下之外,还有一种很好的方法可以保持一次出现0吗?

for i in mylist:
    if mylist.count(0) > 1:
        mylist.remove(0)

The second approach takes more than double the time for this example. 第二种方法需要的时间是这个例子的两倍多。

Clarification: 澄清:

  • currently, I don't care about the order of items in the list, as I currently sort it after it has been created and cleaned, but that might change later. 目前,我不关心列表中项目的顺序,因为我目前在创建和清理它之后对其进行排序,但这可能会在以后更改。

  • currently, I only need to remove duplicates for one specific item (that is 0 in my example) 目前,我只需删除一个特定项目的重复项(在我的示例中为0

The solution: 解决方案:

[0] + [i for i in mylist if i]

looks good enough, except if 0 is not in mylist , in which case you're wrongly adding 0. 看起来不错,除非0不在mylist ,在这种情况下你错误地添加0。

Besides, adding 2 lists like this isn't very good performance wise. 此外,添加这样的2个列表并不是很好的性能。 I'd do: 我会做:

newlist = [i for i in mylist if i]
if len(newlist) != len(mylist):  # 0 was removed, add it back
   newlist.append(0)

(or using filter newlist = list(filter(None,mylist)) which could be slightly faster because there are no native python loops) (或使用过滤器newlist = list(filter(None,mylist)) ,因为没有本机python循环,所以可能会稍快一些)

Appending to a list at the last position is very efficient ( list object uses pre-allocation and most of the time no memory is copied). 在最后一个位置附加到列表非常有效( list对象使用预分配,大多数时间没有复制内存)。 The length test trick is O(1) and allows to avoid to test 0 in mylist 长度测试技巧是O(1)并允许避免0 in mylist测试0 in mylist

If performance is an issue and you are happy to use a 3rd party library, use numpy . 如果性能是一个问题,并且您乐意使用第三方库,请使用numpy

Python standard library is great for many things. Python标准库非常适合很多东西。 Computations on numeric arrays is not one of them. 数值数组的计算不是其中之一。

import numpy as np

mylist = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9])

mylist = np.delete(mylist, np.where(mylist == 0)[0][1:])

# array([4, 1, 2, 6, 1, 0, 9, 8, 9])

Here the first argument of np.delete is the input array. 这里np.delete的第一个参数是输入数组。 The second argument extracts the indices of all occurrences of 0, then extracts the second instance onwards. 第二个参数提取所有出现的0的索引,然后从中提取第二个实例。

Performance benchmarking 绩效基准

Tested on Python 3.6.2 / Numpy 1.13.1. 在Python 3.6.2 / Numpy 1.13.1上测试。 Performance will be system and array specific. 性能将是系统和阵列特定的。

%timeit jp(myarr.copy())         # 183 µs
%timeit vui(mylist.copy())       # 393 µs
%timeit original(mylist.copy())  # 1.85 s

import numpy as np
from collections import Counter

myarr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000)
mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000

def jp(myarr):
    return np.delete(myarr, np.where(myarr == 0)[0][1:])

def vui(mylist):
    return [0] + list(filter(None, mylist))

def original(mylist):
    for i in mylist:
        if mylist.count(0) > 1:
            mylist.remove(0)

    return mylist

It sounds like a better data structure for you to use would be collections.Counter (which is in the standard library): 这听起来像是一个更好的数据结构,你可以使用collections.Counter (在标准库中):

import collections

counts = collections.Counter(mylist)
counts[0] = 1
mylist = list(counts.elements())

Here is a generator-based approach with approximately O(n) complexity that also preserves the order of the original list: 这是一个基于生成器的方法,具有大约O(n)复杂度,也保留了原始列表的顺序:

In [62]: def remove_dup(lst, item):
    ...:     temp = [item]
    ...:     for i in lst:
    ...:         if i != item:
    ...:             yield i
    ...:         elif i == item and temp:
    ...:             yield temp.pop()
    ...:             

In [63]: list(remove_dup(mylist, 0))
Out[63]: [4, 1, 2, 6, 1, 0, 9, 8, 9]

Also if you are dealing with larger lists you can use following vectorized and optimized approach using Numpy: 此外,如果您正在处理更大的列表,您可以使用Numpy使用以下矢量化和优化方法:

In [80]: arr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9])

In [81]: mask = arr == 0

In [82]: first_ind = np.where(mask)[0][0]

In [83]: mask[first_ind] = False

In [84]: arr[~mask]
Out[84]: array([4, 1, 2, 6, 1, 0, 9, 8, 9])

Slicing should do 切片应该做

a[start:end] # items start through end-1
a[start:]    # items start through the rest of the list
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole list

Input: 输入:

mylist = [4,1, 2, 6, 1, 0, 9, 8, 0, 9,0,0,9,2,2,]
pos=mylist.index(0)
nl=mylist[:pos+1]+[i  for i in mylist[pos+1:] if i!=0]

print(nl)

Output: [4, 1, 2, 6, 1, 0, 9, 8, 9, 9, 2, 2] 输出: [4, 1, 2, 6, 1, 0, 9, 8, 9, 9, 2, 2]

You can use this: 你可以用这个:

desired_value = 0
mylist = [i for i in mylist if i!=desired_value] + [desired_value]

You can now change your desired value, you can also make it as a list like this 您现在可以更改所需的值,也可以将其设为这样的列表

desired_value = [0, 6]
mylist = [i for i in mylist if i not in desired_value] + desired_value

也许你可以使用filter

[0] + list(filter(lambda x: x != 0, mylist))

You can use an itertools.count counter which will return 0, 1, ... each time it is iterated on: 你可以使用一个itertools.count计数器,它会在每次迭代时返回0,1,......

from itertools import count

mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]

counter = count()

# next(counter) will be called each time i == 0
# it will return 0 the first time, so only the first time
# will 'not next(counter)' be True
out = [i for i in mylist if i != 0 or not next(counter)]
print(out)

# [4, 1, 2, 6, 1, 0, 9, 8, 9]

The order is kept, and it can be easily modified to deduplicate an arbitrary number of values: 保留订单,可以轻松修改订单以重复删除任意数量的值:

from itertools import count

mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]

items_to_dedup = {1, 0}
counter = {item: count() for item in items_to_dedup}

out = [i for i in mylist if i not in items_to_dedup or not next(counter[i])]
print(out)

# [4, 1, 2, 6, 0, 9, 8, 9]

here's on line for it: where m is number to be occured once,and the order is kept 这里是它的在线:其中m是一次发生的数字,并保留订单

[x for i,x in enumerate(mylist) if mylist.index(x)==i or x!=m]

Result 结果

[4, 1, 2, 6, 1, 0, 9, 8, 9]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM