简体   繁体   English

遍历列表时从列表中删除元素

[英]Removing elements from the list when looping over it

This part of my code does not scale if dimension gets bigger. 如果尺寸变大,我的代码的这一部分将无法缩放。

I loop over my data and accumulate them every dt time window. 我遍历我的数据,并在每个dt时间窗口对其进行累积。 To do this I compare lower and upper time value. 为此,我比较上下时间值。 When I reach upper bound, I break the for loop for efficiency. 当我达到上限时,我打破了for循环以提高效率。 The next time I run for loop I want to start not from its beginning but from the element I stopped previously, for efficiency. 下次运行循环时,我不想从循环开始,而是从先前停止的元素开始,以提高效率。 How can I do that? 我怎样才能做到这一点?

I tried to remove/pop elements of the list but indexes get messed up. 我试图删除/弹出列表中的元素,但索引变得混乱。 I read that I cannot modify the list I loop over, but my goal seems to be not uncommon so there has to be solution. 我读到我无法修改循环的列表,但是我的目标似乎并不罕见,因此必须找到解决方案。 I don't care about original data list later in my code, I only want optimization of my accumulation. 以后我不在乎代码中的原始数据列表,我只想优化累积量。

# Here I generate data for you to show my problem
from random import randint
import numpy as np

dimension = 200
times = [randint(0, 1000) for p in range(0, dimension)]
times.sort()
values = [randint(0, dimension) for p in range(0, dimension)]
data = [(values[k], times[k]) for k in range(dimension)]
dt = 50.0
t = min(times)
pixels = []
timestamps = []

# this is my problem
while (t <= max(times)):
    accumulator = np.zeros(dimension)
    for idx, content in enumerate(data):
        # comparing lower bound of the 'time' window
        if content[1] >= t:
            # comparing upper bound of the 'time' window
            if (content[1] < t + dt):
                accumulator[content[0]] += 1
                # if I pop the first element from the list after accumulating, indexes are screwed when looping further
                # data.pop(0)
            else:
                # all further entries are bigger because they are sorted
                break

    pixels.append(accumulator)
    timestamps.append(t)
    t += dt

In a simpler form, I think you are trying to do: 以一种简单的形式,我认为您正在尝试做:

In [158]: times=[0, 4, 6, 10]
In [159]: data=np.arange(12)
In [160]: cnt=[0 for _ in times]
In [161]: for i in range(len(times)-1):
     ...:     for d in data:
     ...:         if d>=times[i] and d<times[i+1]:
     ...:             cnt[i]+=1
     ...:             
In [162]: cnt
Out[162]: [4, 2, 4, 0]

And you are trying to make this data loop more efficient by breaking form the loop when d gets too large, and by starting the next loop after items which have already been counted. 而且,您试图通过在d太大时中断循环并在已经计数的项目之后开始下一个循环来使此data循环更有效。

Adding the break is easy as you've done: 完成后,添加中断很容易:

In [163]: cnt=[0 for _ in times]
In [164]: for i in range(len(times)-1):
     ...:     for d in data:
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 break

In [165]: cnt
Out[165]: [4, 2, 4, 0]

One way to skip the counted stuff is to replace the for d in data with a index loop; 一种跳过计数的东西的方法是用索引循环替换for d in datafor d in data and keep track of where we stopped last time around: 并跟踪我们上次停止的位置:

In [166]: cnt=[0 for _ in times]
In [167]: start=0
     ...: for i in range(len(times)-1):
     ...:     for j in range(start,len(data)):
     ...:         d = data[j]
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 start = j
     ...:                 break
     ...:                 
In [168]: cnt
Out[168]: [4, 2, 4, 0]

A pop based version requires that I work with a list (my data is an array), a requires inserting the value back at the break 一个基于pop的版本要求我使用一个列表(我的data是一个数组),a需要在中断处插入该值

In [186]: datal=data.tolist()
In [187]: cnt=[0 for _ in times]
In [188]: for i in range(len(times)-1):
     ...:     while True:
     ...:         d = datal.pop(0)
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 datal.insert(0,d)
     ...:                 break
     ...:             
In [189]: cnt
Out[189]: [4, 2, 4, 0]
In [190]: datal
Out[190]: [10, 11]

This isn't perfect, since I still have items on the list at the end (my times don't cover the whole data range). 这不是完美的,因为最后我仍然有项目在列表中(我的times没有涵盖整个data范围)。 But it tests the idea. 但是它检验了这个想法。

Here's something closer to your attempt: 这更接近您的尝试:

In [203]: for i in range(len(times)-1):
     ...:     for d in datal[:]:
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:                 datal.pop(0)
     ...:             else:
     ...:                 break
     ...:       

The key difference is that I iterate on a copy of datal . 关键区别在于我迭代了datal的副本。 That way the pop affects datal , but doesn't affect the current iteration. 这样, pop会影响datal ,但不会影响当前迭代。 Admittedly there's a cost to the copy, so the speed up might be significant. 不可否认,复制是有成本的,因此提高速度可能会很明显。

A different approach would be to loop on data , and step time as the t and t+dt boundaries are crossed. 另一种方法是在data上循环,并跨越tt+dt边界时的步time

In [222]: times=[0, 4, 6, 10,100]
In [223]: cnt=[0 for _ in times]; i=0
In [224]: for d in data:
     ...:     if d>=times[i]:
     ...:         if d<times[i+1]:
     ...:             cnt[i]+=1
     ...:         else:
     ...:             i += 1
     ...:             cnt[i]+=1
     ...:             
In [225]: cnt
Out[225]: [4, 2, 4, 2, 0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM