简体   繁体   English

删除 Python 列表中的类似项目

[英]Remove similar items in a list in Python

How do you remove similar items in a list in Python but only for a given item.如何删除 Python 列表中的类似项目,但仅适用于给定项目。 Example,例子,

l = list('need')

If 'e' is the given item then如果'e'是给定的项目,那么

l = list('nd')

The set() function will not do the trick since it will remove all duplicates. set() function 不会成功,因为它会删除所有重复项。

count() and remove() is not efficient. count() 和 remove() 效率不高。

use filter使用filter

assuming you write function that decide on the items that you want to keep in the list.假设您编写 function 来决定要保留在列表中的项目。

for your example对于你的例子

 def pred(x):
     return x!="e"
 l=list("need")
 l=list(filter(pred,l))

Assuming given = 'e' and l= list('need') .假设given = 'e'l= list('need')

for i in range(l.count(given)):
    l.remove(given)

If you just want to replace 'e' from the list of words in a list, you can use regex re.sub().如果您只想从列表中的单词列表中替换'e' ,则可以使用正则表达式 re.sub()。 If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn().如果您还想计算从每个单词中删除了多少次 e,那么您可以使用 re.subn()。 The first one will provide you strings in a list.第一个将为您提供列表中的字符串。 The second will provide you a tuple (string, n) where n is the number of occurrences.第二个将为您提供一个元组 (string, n),其中 n 是出现次数。

import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]

The output for j and k are: j 和 k 的 output 是:

j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]

If you want to count the total changes made, just iterate thru k and sum it.如果您想计算所做的总更改,只需遍历 k 并将其相加即可。 There are other simpler ways too.还有其他更简单的方法。 You can simply use regEx您可以简单地使用正则表达式

re.subn('e','',''.join(lst))[1]

This will give you total number of items replaced in the list.这将为您提供列表中替换的项目总数。

List comprehension Method. List comprehension方法。 Not sure if the size/complexity is less than that of count and remove .不确定大小/复杂性是否小于countremove

def scrub(l, given):
    return [i for i in l if i not in given]

Filter method, again i'm not sure过滤方法,我也不确定

def filter_by(l, given):
    return list(filter(lambda x: x not in given, l))

Bruteforce with recursion but there are a lot of potential downfalls.具有recursion的蛮力,但有很多潜在的失败。 Still an option.还是一个选择。 Again I don't know the size/comp再次,我不知道大小/comp

def bruteforce(l, given):
    try:
        l.remove(given[0])
        return bruteforce(l, given)
    except ValueError:
        return bruteforce(l, given[1:])
    except IndexError:
        return l
    return l

For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!对于那些对与上述方法相关的实际时间感到好奇的人,我冒昧地在下面测试它们!

Below is the method I've chosen to use.以下是我选择使用的方法。

def timer(func, name):
    print("-------{}-------".format(name))
    try:
        start = datetime.datetime.now()
        x = func()
        end = datetime.datetime.now()
        print((end-start).microseconds)
    except Exception, e:
        print("Failed: {}".format(e))
    print("\r")

The dataset we are testing against.我们正在测试的数据集。 Where l is our original list and q is the items we want to remove, and r is our expected result.其中l是我们的原始列表, q是我们要删除的项目, r是我们的预期结果。

l = list("need"*50000)
q = list("ne")
r = list("d"*50000)

For posterity I've added the count / remove method the OP was against.对于后代,我添加了 OP 反对的count / remove方法。 (For good reason!) (有充分的理由!)

def count_remove(l, given):
    for i in given:
        for x in range(l.count(i)):
            l.remove(i)
    return l

All that's left to do is test!剩下要做的就是测试!

timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)

timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)

timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)

timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)

And our results而我们的结果

-------List Comp------- -------列表比较-----
10000 10000

-------Filter------- - - - -筛选 - - - -
28000 28000

-------Count/Remove------- --------计数/删除-----
199000 199000

-------Bruteforce------- --------蛮力-----
Failed: maximum recursion depth exceeded失败:超出最大递归深度

Process finished with exit code 0进程以退出代码 0 结束

The Recursion method failed with a larger dataset, but we expected this. Recursion方法在更大的数据集上失败了,但我们预料到了这一点。 I tested on smaller datasets, and Recursion is marginally slower.我在较小的数据集上进行了测试, Recursion速度稍慢。 I thought it would be faster.我以为会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM