简体   繁体   English

在迭代过程中删除python列表的元素

[英]Deleting elements of a python list during iteration

I have a very large list on each element of which I have to do many operations. 我有我必须完成许多操作每个元素非常大名单。 Essentially, each element of the list is appended to in various ways and then used to generate an object. 本质上,列表的每个元素都以各种方式附加到其上,然后用于生成对象。 These objects are then used to generate another list. 这些对象然后用于生成另一个列表。

Unfortunately, doing this in a naive way takes up all of available memory. 不幸的是,以幼稚的方式执行此操作会占用所有可用内存。

I would therefore like to do the following: 因此,我想做以下事情:

for a in b:
    # Do many things with a
    c.append(C(modified_a))
    b[b.index(a)] = None # < Herein lies the rub

This seems to violate the idea that a list should not be modified during iteration. 这似乎违反了在迭代过程中不应修改列表的想法。 Is there a better way to do this kind of manual garbage collecting? 有没有更好的方法来进行这种手动垃圾收集?

This shouldn't be a problem, since you're just assigning new values to list elements, not really deleting them. 这应该不成问题,因为您只是将新值分配给列表元素,而不是真正删除它们。

But instead of searching for a with the index method, you should probably use enumerate. 但是,可能不应该使用枚举来搜索索引方法,而不必使用index方法。

See also here: http://unspecified.wordpress.com/2009/02/12/thou-shalt-not-modify-a-list-during-iteration/ "Firstly, let me be clear that in this article, when I say “modify”, I mean inserting or removing items from the list. Merely updating or mutating the list items is fine." 另请参见此处: http : //unspecified.wordpress.com/2009/02/12/thou-shalt-not-modify-a-list-during-iteration/ “首先,请允许我在本文中明确指出说“修改”,我的意思是从列表中插入或删除项目。仅更新或变异列表项目就可以了。”

Your best bet is a generator : 最好的选择是生成器

def gen(b):
   for a in b:
      # Do many things with a
      yield a

Done properly here, no additional memory required. 此处正确完成,不需要额外的内存。

There are several issues with your code. 您的代码有几个问题。

First, assigning None to a list element does not delete it: 首先,为列表元素分配None不会将其删除:

>>> l=[1,2,3,4,5,6,6,7,8,9]
>>> len(l)
10
>>> l[l.index(5)]=None
>>> l
[1, 2, 3, 4, None, 6, 6, 7, 8, 9]
>>> len(l)
10

Second, using an index to find the element that you want to change is not at all efficient way to do this. 其次,使用索引查找要更改的元素根本不是有效的方法。

You can use enumerate, but you would still need to loop through to delete the None values. 您可以使用枚举,但是您仍然需要遍历以删除None值。

for i,a in enumerate(b):
    # Do many things with a
    b[i]=C(modified_a)
    b[i]=None 
c=[e for e in b if e is not None]

You could use a list comprehension to just copy the new 'a' values to the c list then delete b: 您可以使用列表推导将新的“ a”值复制到c列表中,然后删除b:

c=[do_many_things(a) for a in b]
del b                              # will still occupy memory if not deleted...

Or if you want b to be modified in place, you can use slice assignment : 或者,如果您想在适当位置修改b,则可以使用slice分配

b[:]=[do_many_things(a) for a in b]

Slice assignment works this way: 切片分配以这种方式工作:

#shorted a list
>>> b=[1,2,3,4,5,6,7,8,9]
>>> b[2:7]=[None]
>>> b
[1, 2, None, 8, 9]

#expand a list
>>> c=[1,2,3]
>>> c[1:1]=[22,33,44]
>>> c
[1, 22, 33, 44, 2, 3]

# modify in place
>>> c=[1,2,3,4,5,6,7]
>>> c[0:7]=[11,12,13,14,15,16,17]
>>> c
[11, 12, 13, 14, 15, 16, 17]

You can use it in a list comprehension like so: 您可以像这样在列表理解中使用它:

>>> c=list(range(int(1e6)))
>>> c[:]=[e for e in c if e<10]
>>> c
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

One of the comments pointed out that slice assignment does not modify in place exactly; 其中一项评论指出,切片分配未完全修改到位。 that a temp list is generated. 生成临时列表。 That is true. 那是真实的。 However, let's look at the total timings here: 但是,让我们在这里查看总时间:

import time
import random
fmt='\t{:25}{:.5f} seconds' 
count=int(1e5)
a=[random.random() for i in range(count)]
b=[e for e in a]

t1=time.time()
for e in b:
    if e<0.5: b[b.index(e)]=None  
c=[e for e in b if e is not None]    
print(fmt.format('index, None',time.time()-t1))

b=[e for e in a]
t1=time.time()
for e in b[:]:
    if e<0.5: del b[b.index(e)]  
print(fmt.format('index, del',time.time()-t1))

b=[e for e in a]
t1=time.time()
for i,e in enumerate(b[:]):
    if e<0.5: b[i]=None
c=[e for e in b if e is not None]    
print(fmt.format('enumerate, copy',time.time()-t1))

t1=time.time()
c=[e for e in a if e<.5]
del a
print(fmt.format('c=',time.time()-t1))

b=[e for e in a]
t1=time.time()
b[:]=[e for e in b if e<0.5]
print(fmt.format('a[:]=',time.time()-t1))

On my computer, prints this: 在我的计算机上,打印以下内容:

index, None              87.30604 seconds
index, del               28.02836 seconds
enumerate, copy          0.02923 seconds
c=                       0.00862 seconds
a[:]=                    0.00824 seconds

Or, use numpy for more optimized array options if this does not help. 或者,如果这样做没有帮助,请使用numpy以获得更优化的数组选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM