简体   繁体   English

Python - 从列表中删除项目

[英]Python - removing items from lists

# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3  # Of course this isn't going to work

I'm wondering, what is the "correct" way to do this. 我想知道,做到这一点的“正确”方法是什么。 I can do it many different ways, but Python's style guide says there should be only 1 correct way of doing each thing. 我可以用很多不同的方式来做,但Python的风格指南说应该只有一种正确的方法来做每件事。 I've never known what this was. 我从来不知道这是什么。

Here are some tries: 以下是一些尝试:

L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ]  # parens for clarity

tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]

Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. 现在我有一点时间思考,我意识到L2 + L3创建了一个临时列表,立即被抛弃。 So an even better way is: 所以更好的方法是:

tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]

Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. 更新:我看到一些关于性能的奢侈声明,我想声称我的解决方案已经尽可能快。 Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here. 创建中间结果,无论它们是中间列表还是必须重复调用的中间迭代器,总是会比直接给出L2L3直接迭代更慢,就像我在这里做的那样。

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop

All other alternatives (that I can think of) will necessarily be slower than this. 所有其他选择(我能想到)都必然比这慢。 Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense: 例如,自己做循环,而不是让set()构造函数执行它们,增加了费用:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop

Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive: 使用迭代器,它们涉及的所有状态保存和回调显然会更加昂贵:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
  'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' 
10000 loops, best of 3: 47.1 usec per loop

So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists. 所以我相信我昨晚给出的答案仍然很遥远(对于“遥远”的值大于5微秒,显然)是最好的,除非提问者在L1有重复并希望每次删除一次副本出现在其他列表中的时间。

update::: post contains a reference to false allegations of inferior performance of sets compared to frozensets. update ::: post包含对与frozensets相比较低的集合性能的错误指控的引用。 I maintain that it's still sensible to use a frozenset in this instance, even though there's no need to hash the set itself, just because it's more correct semantically. 我认为在这个实例中使用冻结集仍然是明智的,即使不需要对集合本身进行散列,只是因为它在语义上更正确。 Though, in practice, I might not bother typing the extra 6 characters. 虽然,在实践中,我可能不会打扰额外的6个字符。 I'm not feeling motivated to go through and edit the post, so just be advised that the "allegations" link links to some incorrectly run tests. 我没有动力去编辑帖子,所以请注意,“指控”链接链接到一些错误运行的测试。 The gory details are hashed out in the comments. 评论中记录了血淋淋的细节。 :::update :::更新

The second chunk of code posted by Brandon Craig Rhodes is quite good, but as he didn't respond to my suggestion about using a frozenset (well, not when I started writing this, anyway), I'm going to go ahead and post it myself. Brandon Craig Rhodes 发布的第二大块代码相当不错,但由于他没有回应我关于使用冷冻装置的建议(好吧,不是我开始写这篇文章的时候),我还是要继续发布我自己。

The whole basis of the undertaking at hand is to check if each of a series of values ( L1 ) are in another set of values; 手头工作的整个基础是检查一系列值( L1 )中的每一个是否属于另一组值; that set of values is the contents of L2 and L3 . 该组值是L2L3的内容。 The use of the word "set" in that sentence is telling: even though L2 and L3 are list s, we don't really care about their list-like properties, like the order that their values are in or how many of each they contain. 在该句中使用“set”一词就说明了:即使L2L3list s,我们也不关心它们的类似列表的属性,例如它们的值的顺序或它们的数量。包含。 We just care about the set (there it is again) of values they collectively contain. 我们只关心他们共同拥有的价值 (在那里)。

If that set of values is stored as a list, you have to go through the list elements one by one, checking each one. 如果将该组值存储为列表,则必须逐个检查列表元素,并检查每个元素。 It's relatively time-consuming, and it's bad semantics: again, it's a "set" of values, not a list. 这是相对耗时的,而且它的语义很糟糕:再次,它是一组“值”,而不是列表。 So Python has these neat set types that hold a bunch of unique values, and can quickly tell you if some value is in them or not. 因此Python具有这些整齐的集合类型,它们包含许多独特的值,并且可以快速告诉您是否存在某些值。 This works in pretty much the same way that python's dict types work when you're looking up a key. 这与python的dict类型在查找键时的工作方式非常相似。

The difference between sets and frozensets is that sets are mutable, meaning that they can be modified after creation. 集合frozensets之间的区别在于集合是可变的,这意味着它们可以在创建后进行修改。 Documentation on both types is here . 这两种类型的文档都在这里

Since the set we need to create, the union of the values stored in L2 and L3 , is not going to be modified once created, it's semantically appropriate to use an immutable data type. 由于我们需要创建的集合,存储在L2L3中的值的并集在创建后不会被修改,因此在语义上适合使用不可变数据类型。 This also allegedly has some performance benefits. 据称这也有一些性能上的好处。 Well, it makes sense that it would have some advantage; 嗯,它有一些优势是有意义的; otherwise, why would Python have frozenset as a builtin? 否则,为什么Python已经frozenset为内置?

update ... 更新 ......

Brandon has answered this question: the real advantage of frozen sets is that their immutability makes it possible for them to be hashable , allowing them to be dictionary keys or members of other sets. 布兰登回答了这个问题:冻结套装的真正优势在于它们的不变性使它们可以清洗 ,允许它们成为字典键或其他套件的成员。

I ran some informal timing tests comparing the speed for creation of and lookup on relatively large (3000-element) frozen and mutable sets; 我运行了一些非正式的时序测试,比较了相对较大(3000元素)的冻结和可变集合的创建和查找速度; there wasn't much difference. 差别不大。 This conflicts with the above link, but supports what Brandon says about them being identical but for the aspect of mutability. 这与上述链接相冲突,但支持Brandon所说的关于它们相同但在可变性方面的内容。

... update ...... 更新

Now, because frozensets are immutable, they don't have an update method. 现在,因为frozensets是不可变的,所以它们没有更新方法。 Brandon used the set.update method to avoid creating and then discarding a temporary list en route to set creation; Brandon使用set.update方法来避免创建然后丢弃临时列表以设置创建; I'm going to take a different approach. 我将采取不同的方法。

items = (item for lst in (L2, L3) for item in lst)

This generator expression makes items an iterator over, consecutively, the contents of L2 and L3 . 生成器表达式使items成为迭代器,连续地覆盖L2L3的内容。 Not only that, but it does it without creating a whole list-full of intermediate objects. 不仅如此,它还没有创建一个完整的列表 - 中间对象。 Using nested for expressions in generators is a bit confusing, but I manage to keep it sorted out by remembering that they nest in the same order that they would if you wrote actual for loops, eg 在生成器中使用嵌套for表达式有点令人困惑,但我设法通过记住它们以与编写实际for循环时相同的顺序嵌套来保持它的排序,例如

def get_items(lists):
    for lst in lists:
        for item in lst:
            yield item

That generator function is equivalent to the generator expression that we assigned to items . 生成器函数等效于我们分配给items的生成器表达式。 Well, except that it's a parametrized function definition instead of a direct assignment to a variable. 好吧,除了它是一个参数化的函数定义,而不是直接赋值给变量。

Anyway, enough digression. 无论如何,足够的题外话。 The big deal with generators is that they don't actually do anything. 发电机的重要性在于它们实际上并没有做任何事情。 Well, at least not right away: they just set up work to be done later, when the generator expression is iterated . 好吧,至少不是马上:他们只是设置工作,以便在迭代生成器表达式后再完成。 This is formally referred to as being lazy . 这被正式称为懒惰 We're going to do that (well, I am, anyway) by passing items to the frozenset function, which iterates over it and returns a frosty cold frozenset. 我们将通过将items传递给frozenset函数来做到这一点(好吧,无论如何),该函数迭代它并返回一个冷冻冷冻集。

unwanted = frozenset(items)

You could actually combine the last two lines, by putting the generator expression right inside the call to frozenset : 实际上,您可以通过将生成器表达式放在对frozenset的调用内部来实际组合最后两行:

unwanted = frozenset(item for lst in (L2, L3) for item in lst)

This neat syntactical trick works as long as the iterator created by the generator expression is the only parameter to the function you're calling. 只要生成器表达式创建的迭代器是您正在调用的函数的唯一参数,这种简洁的语法技巧就可以工作。 Otherwise you have to write it in its usual separate set of parentheses, just like you were passing a tuple as an argument to the function. 否则你必须在通常单独的括号中写它,就像你将一个元组作为参数传递给函数一样。

Now we can build a new list in the same way that Brandon did, with a list comprehension . 现在我们可以像Brandon一样建立一个新列表,并具有列表理解能力 These use the same syntax as generator expressions, and do basically the same thing, except that they are eager instead of lazy (again, these are actual technical terms), so they get right to work iterating over the items and creating a list from them. 它们使用与生成器表达式相同的语法,并且基本上做同样的事情,除了它们渴望而不是懒惰 (再次,这些是实际的技术术语),因此他们可以正确地迭代项目并从中创建列表。

L4 = [item for item in L1 if item not in unwanted]

This is equivalent to passing a generator expression to list , eg 这相当于将生成器表达式传递给list ,例如

L4 = list(item for item in L1 if item not in unwanted)

but more idiomatic. 但更惯用。

So this will create the list L4 , containing the elements of L1 which weren't in either L2 or L3 , maintaining the order that they were originally in and the number of them that there were. 因此,这将创建列表L4 ,其中包含不在L2L3L1元素,保持它们最初的顺序以及它们的数量。


If you just want to know which values are in L1 but not in L2 or L3 , it's much easier: you just create that set: 如果您只想知道哪些值在L1而不在L2L3 ,则更容易:您只需创建该集:

L1_unique_values = set(L1) - unwanted

You can make a list out of it, as does st0le , but that might not really be what you want. 你可以用它来制作一个列表, 就像st0le一样 ,但这可能不是你想要的。 If you really do want the set of values that are only found in L1 , you might have a very good reason to keep that set as a set , or indeed a frozenset : 如果你确实想要只在L1找到的值 ,那么你可能有充分的理由将该保存为set ,或者确实是frozenset

L1_unique_values = frozenset(L1) - unwanted

...Annnnd , now for something completely different: ...... Annnnd现在完全不同了:

from itertools import ifilterfalse, chain
L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))

Assuming your individual lists won't contain duplicates....Use Set and Difference 假设您的个人列表不包含重复项....使用SetDifference

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
print(list(set(L1) - set(L2) - set(L3)))

Doing such operations in Lists can hamper your program's performance very soon. 在列表中执行此类操作可能会很快妨碍您的程序性能。 What happens is with each remove, List operations do a fresh malloc & move elements around. 每次删除都会发生什么,List操作会执行一个新的malloc和移动元素。 This can be expensive if you have a very huge list or otherwise. 如果你有一个非常庞大的列表或其他,这可能是昂贵的。 So I would suggest this - 所以我建议这个 -

I am assuming your list has unique elements. 我假设你的清单有独特的元素。 Otherwise you need to maintain a list in your dict having duplicate values. 否则,您需要在dict中维护一个具有重复值的列表。 Anyway for the data your provided, here it is- 无论如何,对于您提供的数据,这里是 -

METHOD 1 方法1

d = dict()
for x in L1: d[x] = True

# Check if L2 data is in 'd'
for x in L2:
    if x in d:
        d[x] = False

for x in L3:
    if x in d:
        d[x] = False

# Finally retrieve all keys with value as True.
final_list = [x for x in d if d[x]]

METHOD 2 If all that looks like too much code. 方法2如果所有看起来像代码太多。 Then you could try using set . 然后你可以尝试使用set But this way your list will loose all duplicate elements. 但是这样你的列表将会丢失所有重复的元素。

final_set  = set.difference(set(L1),set(L2),set(L3))
final_list = list(final_set)

This may be less pythonesque than the list-comprehension answer, but has a simpler look to it: 这可能比列表理解答案更少pythonesque,但有一个更简单的外观:

l1 = [ ... ]
l2 = [ ... ]

diff = list(l1) # this copies the list
for element in l2:
    diff.remove(element)

The advantage here is that we preserve order of the list, and if there are duplicate elements , we remove only one for each time it appears in l2. 这里的优点是我们保留了列表的顺序 ,如果有重复的元素 ,我们每次在l2中出现时只删除一个元素

I think intuited's answer is way too long for such a simple problem, and Python already has a builtin function to chain two lists as a generator. 我认为对于这样一个简单的问题,直觉的答案太长了,而Python已经有了一个内置函数来将两个列表链接为一个生成器。

The procedure is as follows: 程序如下:

  1. Use itertools.chain to chain L2 and L3 without creating a memory-consuming copy 使用itertools.chain链接L2和L3,而不创建占用大量内存的副本
  2. Create a set from that (in this case, a frozenset will do because we don't change it after creation) 从中创建一个集合(在这种情况下,冻结集将执行,因为我们在创建后不会更改它)
  3. Use list comprehension to filter out elements that are in L1 and also in L2 or L3. 使用列表推导过滤出L1和L2或L3中的元素。 As set/frozenset lookup ( x in someset ) is O(1), this will be very fast. 由于set / frozenset lookup(某些集合中的x in someset )是O(1),因此速度非常快。

And now the code: 现在代码:

L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]

from itertools import chain
tmp = frozenset(chain(L2, L3))
L4 = [x for x in L1 if x not in tmp] # [1, 3, 6]

This should be one of the fastest, simplest and least memory-consuming solution. 这应该是最快,最简单,耗电量最少的解决方案之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM